<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>James Lindsay&#039;s Portfolio</title>
	<atom:link href="http://www.jamesrlindsay.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.jamesrlindsay.com</link>
	<description>Computers and Genetics</description>
	<lastBuildDate>Mon, 07 May 2012 23:23:16 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Innovation Quest Challenge</title>
		<link>http://www.jamesrlindsay.com/2012/04/innovation-quest-challenge/</link>
		<comments>http://www.jamesrlindsay.com/2012/04/innovation-quest-challenge/#comments</comments>
		<pubDate>Wed, 25 Apr 2012 21:57:08 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[announcements]]></category>

		<guid isPermaLink="false">http://www.jamesrlindsay.com/?p=127</guid>
		<description><![CDATA[My team took 2nd place in the inaugural Innovation Quest (iQ) challenge at UCONN. Stay tuned for more details regarding the company! Edit to include a link from the University press! http://today.uconn.edu/blog/2012/04/a-big-boost-for-budding-entrepreneurs/]]></description>
			<content:encoded><![CDATA[<p>My team took 2nd place in the inaugural Innovation Quest (iQ) challenge at UCONN. Stay tuned for more details regarding the company!</p>
<p>Edit to include a link from the University press!</p>
<p><a title="SIBOP press" href="http://today.uconn.edu/blog/2012/04/a-big-boost-for-budding-entrepreneurs/">http://today.uconn.edu/blog/2012/04/a-big-boost-for-budding-entrepreneurs/</a></p>
<div id="attachment_128" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.jamesrlindsay.com/wp-content/uploads/2012/04/check.jpg"><img class="size-medium wp-image-128" title="fake_check" src="http://www.jamesrlindsay.com/wp-content/uploads/2012/04/check-300x223.jpg" alt="Innovation Quest Prize" width="300" height="223" /></a><p class="wp-caption-text">2nd place in the innovation question</p></div>
]]></content:encoded>
			<wfw:commentRss>http://www.jamesrlindsay.com/2012/04/innovation-quest-challenge/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Repeat Screening</title>
		<link>http://www.jamesrlindsay.com/2012/02/repeat-screening/</link>
		<comments>http://www.jamesrlindsay.com/2012/02/repeat-screening/#comments</comments>
		<pubDate>Thu, 09 Feb 2012 21:45:58 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[repeats]]></category>
		<category><![CDATA[repeat masker repeat modeler]]></category>

		<guid isPermaLink="false">http://www.jamesrlindsay.com/?p=115</guid>
		<description><![CDATA[A common set of question I receive is, how do I: screen my sequence of interest against the repeatome of an organism. identify if my sequence of interest has new repeats. These two questions can be answered from a biologists perspective &#8230; <a href="http://www.jamesrlindsay.com/2012/02/repeat-screening/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>A common set of question I receive is, how do I:</p>
<ol>
<li>screen my sequence of interest against the <em>repeatome</em> of an organism.</li>
<li>identify if my sequence of interest has new repeats.</li>
</ol>
<p>These two questions can be answered from a biologists perspective using existing tools. A computer scientist might find some interesting problems associated with these questions but that would be beyond the scope of this short post.</p>
<p><a href="http://www.jamesrlindsay.com/wp-content/uploads/2012/02/gb-2010-11-1-r1-s11.png"><img class="alignright size-medium wp-image-123" title="gb-2010-11-1-r1-s11" src="http://www.jamesrlindsay.com/wp-content/uploads/2012/02/gb-2010-11-1-r1-s11-300x300.png" alt="" width="300" height="300" /></a></p>
<p>Professor wikipedia gives a great definition of the term I just used <a title="repeatome" href="http://en.wikipedia.org/wiki/Repeatome"><em>repeatome</em></a> (sarcasm), however it does link to a paper which may give you better background. In essence the <em>repeatome</em> is the catalog of repetitive DNA sequences in a genome. The repeatome (no more italics because you should know it by know) can be constructed using two paradigms.</p>
<ol>
<li>comparative biology, or using a database of known repeats to annotate the unknown sequence. The most common tool used is <a href="http://www.repeatmasker.org/">RepeatMasker</a>.</li>
<li>de novo, which uses some type of algorithm and kmer frequency to find repetitive regions. The most common tool is <a href="http://www.repeatmasker.org/RepeatModeler.html">RepeatModeler</a></li>
</ol>
<p>The two approaches are different but each yields a repeatome, although they won&#8217;t always agree. RepeatMasker uses a database, usually <a href="www.girinst.org/repbase/">Repbase</a>, and a mapping tool like BLAST to annotate the supplied sequence. RepeatModeler first identifies repeat motifs using various algorithms, then this database is supplied to RepeatMasker to annotate the genome.</p>
<h2>Answers</h2>
<p>The questions are very similar and we can accomplish them using RepeatMasker and then RepeatModeler.  First we want to know if my query overlaps with the existing repeatome. If the sequence is small, there is a <a href="http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker">web interface for RepeatMaske</a>r. If the sequence is large then it must be run on the command line like so:</p>
<pre>RepeatMasker -gff sequence.fasta</pre>
<p>Assuming you have RepeatMasker and Repbase installed and configured this command will annotate &#8220;sequence.fasta&#8221; and place several files in the directory where the command was executed. The two important files are:</p>
<ul>
<li>sequence.fasta.masked</li>
<li>sequence.fasta.gff</li>
</ul>
<p>The masked file is the fasta file with repeats represented by lowercase letters. The <a href="http://genome.ucsc.edu/FAQ/FAQformat#format3">GFF file is the annotation file</a> which tells which bases are considered to be repeats and what those repeats are. This should answer question 1, we need to use RepeatModeler to discover new repeats in question 2.</p>
<p>If you which to run RepeatModeler the following commands will be used:</p>
<pre>/opt/RepeatModeler/BuildDatabase -name sequence sequence.fasta
RepeatModeler -database sequence
RepeatMasker -gff -lib RMXYZ/consensi.fa.classified sequence.fasta</pre>
<p>The first command creates a database for the RepeatModeler tools. The second command executes the de novo repeat finding tools and generates a database of repeat motifs in the file &#8220;consensi.fa.classified&#8221;. This file is then used as a database for RepeatMasker. If there are entries the the GFF output by repeat masker then those are the denovo repeats in your sequence.</p>
<p>Hopes this helps!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jamesrlindsay.com/2012/02/repeat-screening/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Startup Weekend: Buses2</title>
		<link>http://www.jamesrlindsay.com/2012/01/startup-weekend-buses2/</link>
		<comments>http://www.jamesrlindsay.com/2012/01/startup-weekend-buses2/#comments</comments>
		<pubDate>Mon, 30 Jan 2012 22:08:24 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.jamesrlindsay.com/?p=110</guid>
		<description><![CDATA[I just participated in Startup Weekend Storrs, an event that brings together entrepreneurial developers, business folk and designers to try and start a business in 54 hours! The teams are chosen based on 60 seconds elevator pitches (no props), unfortunately &#8230; <a href="http://www.jamesrlindsay.com/2012/01/startup-weekend-buses2/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I just participated in <a title="startup weekend storrs" href="http://storrs.startupweekend.org/">Startup Weekend Storrs</a>, an event that brings together entrepreneurial developers, business folk and designers to try and start a business in 54 hours!</p>
<div id="attachment_111" class="wp-caption alignright" style="width: 310px"><a href="http://www.jamesrlindsay.com/wp-content/uploads/2012/01/startup_weekend.jpg"><img class="size-medium wp-image-111" title="Live QR code demo" src="http://www.jamesrlindsay.com/wp-content/uploads/2012/01/startup_weekend-300x225.jpg" alt="Thats me" width="300" height="225" /></a><p class="wp-caption-text">Me during the final presentation</p></div>
<p>The teams are chosen based on 60 seconds elevator pitches (no props), unfortunately I was unable to convey the awesomeness of my idea for a better public genetics database. Also unfortunately the team I signed up with was disqualified from placing because they had already made too much money (nice problem to have). Despite all that, the weekend was a great success and I look forward to doing it again.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jamesrlindsay.com/2012/01/startup-weekend-buses2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Genome Into the Computer</title>
		<link>http://www.jamesrlindsay.com/2011/10/the-genome/</link>
		<comments>http://www.jamesrlindsay.com/2011/10/the-genome/#comments</comments>
		<pubDate>Fri, 14 Oct 2011 06:01:58 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[assembly and scaffolding]]></category>
		<category><![CDATA[the computer and the genome]]></category>
		<category><![CDATA[assembly]]></category>
		<category><![CDATA[genome]]></category>
		<category><![CDATA[overview]]></category>
		<category><![CDATA[scaffolding]]></category>

		<guid isPermaLink="false">http://www.jamesrlindsay.com/?p=52</guid>
		<description><![CDATA[What is a genome? A genome is the sequence of DNA that makes up the genetic code of every living  organism. In real life DNA is made up of simple chemical compounds bound together sequentially. In the abstract world of &#8230; <a href="http://www.jamesrlindsay.com/2011/10/the-genome/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h1><span class="Apple-style-span" style="line-height: 47px; font-size: 28px; font-family: 'Helvetica Neue',Arial,Helvetica,'Nimbus Sans L',sans-serif;">What is a genome?</span></h1>
<p>A genome is the sequence of DNA that makes up the genetic code of every living  organism. In real life DNA is made up of simple chemical compounds bound together sequentially. In the abstract world of the computer DNA is represented as a <em>string</em> of characters (for example &#8220;ACGGTGCT&#8221;).</p>
<div id="attachment_67" class="wp-caption alignleft" style="width: 272px"><a href="http://www.jamesrlindsay.com/wp-content/uploads/2011/08/genome.gif"><img class="size-medium wp-image-67" title="genome" src="http://www.jamesrlindsay.com/wp-content/uploads/2011/08/genome-262x300.gif" alt="what is a genome" width="262" height="300" /></a><p class="wp-caption-text">simple view of a genome</p></div>
<p>The genome is the underlying medium upon which all information about an organism is stored. There are many resources in books and on the web to help you gain a more complete understanding of the biological significance of a genome (<a title="wikipedia" href="http://en.wikipedia.org/wiki/Genome" target="_blank">wikipedia</a>, <a title="ncbi" href="http://www.ncbi.nlm.nih.gov/About/primer/genetics_genome.html" target="_blank">ncbi</a>, <a title="tour of genome" href="http://www.dnai.org/c/" target="_blank">a tour</a>). However the focus of these posts is not going to be about what the genome does, but rather about what the genome is and how we obtain and work with the genome on a computer.</p>
<h2>Obtain a genome &#8230; IRL</h2>
<p>When I mention that I work with genomes on a computer, most people think I&#8217;m crazy and don&#8217;t quite understand how I can work with something found inside a cell, on a Dell ( check out that alliteration). Fortunately there is a plethora of resources out there that can explain how we obtain a genome in real life.</p>
<p><a title="wikipedia" href="http://en.wikipedia.org/wiki/DNA_sequencing" target="_blank">Professor wikipedia</a> is always a good start to begin to understand a new topic and the previous link provides a somewhat technical overview of DNA sequencing. The US government is heavily involved in many of the earliest and largest DNA sequencing projects and the website <a title="govt overview" href="http://www.genome.gov/10001177" target="_blank">genome.gov</a> provides a concise textual introduction to genome sequencing.</p>
<p>Now for a quick overview in my own words: The process of reading DNA from a cell and transfering that information to a computer is known as &#8220;sequencing&#8221;. The ability to sequence DNA has been around for almost 30 years, but the techniques and speed at which it is done has changes drastically since its inception.</p>
<p>In the 1990&#8242;s to early 2000s DNA sequencing was very expensive, and sequencing the genome of a large mammal (like the human) costs hundreds of millions of dollars. Several for profit companies began automating the processes and developing new techniques. Now it is possible to sequence the human genome for about 10-20 thousand dollars.</p>
<div id="attachment_81" class="wp-caption alignright" style="width: 310px"><a href="http://www.jamesrlindsay.com/wp-content/uploads/2011/08/sequencing_ink.png"><img class="size-medium wp-image-81" title="sequencing overview" src="http://www.jamesrlindsay.com/wp-content/uploads/2011/08/sequencing_ink-300x236.png" alt="overview of DNA sequencing" width="300" height="236" /></a><p class="wp-caption-text">DNA sequencing</p></div>
<p>&nbsp;</p>
<p>I&#8217;ve created a simple graphic to explain how sequencing is done. In essence its a 3 step process; first the DNA is isolated from a cell(s) and loaded into the machine. Next the machine uses some tricks to get each base in the DNA to emit a color and it takes a whole bunch of pictures. Finally the image files are processed and an algorithm figures out what bases they represent, the string of DNA bases is written to a file on a computer.</p>
<h2>Summary</h2>
<p>Thank you for reading the second post in my series dedicated to the computer and the genome.  The reader should now have at least a basic understanding of what a genome is from a biological perspective (by following the provided links), and how DNA is sequenced and represented on a computer. The following post will go into how raw sequenced data is assembled into a genome.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jamesrlindsay.com/2011/10/the-genome/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Genome on a Computer</title>
		<link>http://www.jamesrlindsay.com/2011/09/genome-on-a-computer/</link>
		<comments>http://www.jamesrlindsay.com/2011/09/genome-on-a-computer/#comments</comments>
		<pubDate>Tue, 20 Sep 2011 20:03:44 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[assembly and scaffolding]]></category>
		<category><![CDATA[the computer and the genome]]></category>
		<category><![CDATA[genome]]></category>

		<guid isPermaLink="false">http://www.jamesrlindsay.com/?p=79</guid>
		<description><![CDATA[A genome on a computer For the purpose of this and subsequent posts we will build a hypothetical genome, using very simple assumptions which we will change as needed: genome is only made up of the following characters A,C,G,T (base &#8230; <a href="http://www.jamesrlindsay.com/2011/09/genome-on-a-computer/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h2>A genome on a computer</h2>
<p>For the purpose of this and subsequent posts we will build a hypothetical genome, using very simple assumptions which we will change as needed:</p>
<div id="attachment_96" class="wp-caption alignright" style="width: 310px"><a href="http://www.jamesrlindsay.com/wp-content/uploads/2011/09/genome_ncbi.png"><img class="size-medium wp-image-96" title="genome_ncbi" src="http://www.jamesrlindsay.com/wp-content/uploads/2011/09/genome_ncbi-300x246.png" alt="NCBI Entry" width="300" height="246" /></a><p class="wp-caption-text">genome from ncbi</p></div>
<ul>
<li>genome is only made up of the following characters A,C,G,T (base pairs)</li>
<li>the genome is 10-mer unique</li>
<li>it has a length of 100 base pairs</li>
</ul>
<p>The first assumption is relatively benign and is made by many of the most complex tools used in computational genomics. The third assumption is made simply because 100 characters is a convenient amount to work with on a blog (the human genome is &gt; 3 gigabases, that 3 with 9 zeros) .</p>
<p>The second assumption is slightly more advanced and is the first sighting of a very challenging aspect of working with genomes. We will have to use some jargon to discuss it; <em>k-mer</em> is used to describe any sequence of k bases. So therefore the second assumption simply means that there is no 10 sequential bases in the genome that are the same as any other 10.</p>
<h3>example</h3>
<p><em>Through these posts I will try to include as many visual examples as possible. Many of them will be using python, I will post all information necessary to recreate the examples on your own.</em></p>
<p>This first example will give the reader a visualisation of what a genome is on a computer. First I will define an arbitrary genome sequence (<a href="http://www.ncbi.nlm.nih.gov/nuccore/AC245595.1">taken from here</a> for those who are interested)</p>
<pre class="brush:python">genome = "AGACAGACATAGGAGATTGCTGTAGAAACAAAAATATACGAGTATAATATTGCATAAATTAGGGTGTGCACAAAATATCAGAGAGATGAGCTGGCAACA"</pre>
<p>I claimed the genome was 10-mer unique and the following code should demonstrate that.</p>
<pre class="brush:python">for x in range(0, len(genome) - 9):
     print genome[x:x+10]</pre>
<p>The output of that script is below and you can visually inspect it to be sure that there are no 10-mer duplicates.</p>
<pre class="brush:python">AGACAGACAT
GACAGACATA
ACAGACATAG
CAGACATAGG
AGACATAGGA
GACATAGGAG
ACATAGGAGA
CATAGGAGAT
ATAGGAGATT
TAGGAGATTG
AGGAGATTGC
GGAGATTGCT
GAGATTGCTG
AGATTGCTGT
GATTGCTGTA
ATTGCTGTAG
TTGCTGTAGA
TGCTGTAGAA
GCTGTAGAAA
CTGTAGAAAC
TGTAGAAACA
GTAGAAACAA
TAGAAACAAA
AGAAACAAAA
GAAACAAAAA
AAACAAAAAT
AACAAAAATA
ACAAAAATAT
CAAAAATATA
AAAAATATAC
AAAATATACG
AAATATACGA
AATATACGAG
ATATACGAGT
TATACGAGTA
ATACGAGTAT
TACGAGTATA
ACGAGTATAA
CGAGTATAAT
GAGTATAATA
AGTATAATAT
GTATAATATT
TATAATATTG
ATAATATTGC
TAATATTGCA
AATATTGCAT
ATATTGCATA
TATTGCATAA
ATTGCATAAA
TTGCATAAAT
TGCATAAATT
GCATAAATTA
CATAAATTAG
ATAAATTAGG
TAAATTAGGG
AAATTAGGGT
AATTAGGGTG
ATTAGGGTGT
TTAGGGTGTG
TAGGGTGTGC
AGGGTGTGCA
GGGTGTGCAC
GGTGTGCACA
GTGTGCACAA
TGTGCACAAA
GTGCACAAAA
TGCACAAAAT
GCACAAAATA
CACAAAATAT
ACAAAATATC
CAAAATATCA
AAAATATCAG
AAATATCAGA
AATATCAGAG
ATATCAGAGA
TATCAGAGAG
ATCAGAGAGA
TCAGAGAGAT
CAGAGAGATG
AGAGAGATGA
GAGAGATGAG
AGAGATGAGC
GAGATGAGCT
AGATGAGCTG
GATGAGCTGG
ATGAGCTGGC
TGAGCTGGCA
GAGCTGGCAA
AGCTGGCAAC
GCTGGCAACA</pre>
<p>We can verify programatically that the example genome is indeed 10 unique by the following code;</p>
<pre class="brush:python"># save the genome to a variable.
genome = "AGACAGACATAGGAGATTGCTGTAGAAACAAAAATATACGAGTATAATATTGCATAAATTAGGGTGTGCACAAAATATCAGAGAGATGAGCTGGCAACA"

# make dictionary to track kmers.
kmer_cnt = {}

# loop over every kmer.
for x in range(0, len(genome) - 9):
	# get kmer.
	kmer = genome[x:x+10]

	# add to dictionary
	if kmer not in kmer_cnt:
		kmer_cnt[kmer] = 0
	kmer_cnt[kmer] += 1

# count number of kmers that appeared more than once.
cnt = 0
for kmer in kmer_cnt:
	if kmer_cnt[kmer] &gt; 1:
		cnt += 1

# report count.
print "there was %i repetative kmers" % cnt</pre>
<h2>Summary</h2>
<p>This post should give you an idea of what a genome is from a computational perspective. Basically on a computer we represent a genome by a string of A,C,G,T characters, and our current working example is 10-mer unique, so no consecutive 10 characters will ever be <strong>repeated</strong>.</p>
<p>The next post will discuss how in real life a genome can be obtained using a genome sequencer and how we work with that information on a computer.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jamesrlindsay.com/2011/09/genome-on-a-computer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Computer and The Genome</title>
		<link>http://www.jamesrlindsay.com/2011/08/the-computer-and-the-genome/</link>
		<comments>http://www.jamesrlindsay.com/2011/08/the-computer-and-the-genome/#comments</comments>
		<pubDate>Wed, 31 Aug 2011 19:23:23 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[announcements]]></category>
		<category><![CDATA[the computer and the genome]]></category>

		<guid isPermaLink="false">http://www.jamesrlindsay.com/?p=88</guid>
		<description><![CDATA[Introduction Welcome to the introductory post of a series I will be writing which looks at what a genome is (on a computer) and the computational issues associated with obtaining and publishing them. This is the main focus of my &#8230; <a href="http://www.jamesrlindsay.com/2011/08/the-computer-and-the-genome/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>
<p>Welcome to the introductory post of a series I will be writing which looks at what a genome is (on a computer) and the computational issues associated with obtaining and publishing them. This is the main focus of my research towards my PhD and I will continue to update and expand on these posts as I learn more.</p>
<p>This and subsequent posts will be directed toward a non-technical audience with little background in computer science and biology, therefore you will be spared nuanced points, and arcane vocabulary as much as possible. However I will post bits of Python code which I will use to demonstrate points and readers are encouraged to follow along (python is found on basically all computers).  I welcome comments, clarifications and contributions to this article. Please register to post a comment or send me an email.</p>
<p>This series is inspired by a short book <a title="the computer and the brain" href="http://www.leydesdorff.net/vonneumann/"><em>The Computer and The Brain</em></a> by mathematician John von Neumann.</p>
<div id="attachment_89" class="wp-caption alignleft" style="width: 210px"><a href="http://www.jamesrlindsay.com/wp-content/uploads/2011/08/200px-JohnvonNeumann-LosAlamos.gif"><img class="size-full wp-image-89" title="200px-JohnvonNeumann-LosAlamos" src="http://www.jamesrlindsay.com/wp-content/uploads/2011/08/200px-JohnvonNeumann-LosAlamos.gif" alt="John Von Neumann" width="200" height="260" /></a><p class="wp-caption-text">John von Neumann</p></div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jamesrlindsay.com/2011/08/the-computer-and-the-genome/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tammar Genome Published &#8230; finally</title>
		<link>http://www.jamesrlindsay.com/2011/08/tammar-genome-published/</link>
		<comments>http://www.jamesrlindsay.com/2011/08/tammar-genome-published/#comments</comments>
		<pubDate>Mon, 22 Aug 2011 17:41:25 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[announcements]]></category>
		<category><![CDATA[genome]]></category>
		<category><![CDATA[tammar]]></category>
		<category><![CDATA[wallaby]]></category>

		<guid isPermaLink="false">http://www.jamesrlindsay.com/?p=60</guid>
		<description><![CDATA[Appologies for not updating the blog recently, however I want to point out that the Tammar Genome has finally been published in Genome Biology! I contributed to the effort by developing a genome improvement algorithm to incorporate new sequencing data, and analyzing &#8230; <a href="http://www.jamesrlindsay.com/2011/08/tammar-genome-published/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Appologies for not updating the blog recently, however I want to point out that the Tammar Genome has finally been <a title="tammar genome paper" href="http://genomebiology.com/2011/12/8/R81/abstract">published in Genome Biology</a>! I contributed to the effort by developing a genome improvement algorithm to incorporate new sequencing data, and analyzing the repeats and small rna.</p>
<p>There is a <a title="tammar genome" href="http://blogs.nature.com/news/2011/08/wallaby_genome_hops.html">nature blog post</a> , and an <a title="tammar genome australia" href="http://www.lifescientist.com.au/article/397877/strewth_first_wallaby_genome_published/">Australian news article</a> which gives a brief overview and links to other overviews. I will be starting a series of blog posts which should highlight my work for the tammar genome and give some insight into genome assembly and scaffolding in general.</p>
<div id="attachment_63" class="wp-caption alignleft" style="width: 310px"><a href="http://www.jamesrlindsay.com/wp-content/uploads/2011/08/tammar.jpg"><img class="size-medium wp-image-63" title="tammar wallaby" src="http://www.jamesrlindsay.com/wp-content/uploads/2011/08/tammar-300x211.jpg" alt="about ready to run for it" width="300" height="211" /></a><p class="wp-caption-text">meow look at my genome</p></div>
]]></content:encoded>
			<wfw:commentRss>http://www.jamesrlindsay.com/2011/08/tammar-genome-published/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SGI Altix UV progress report</title>
		<link>http://www.jamesrlindsay.com/2010/12/sgi-altix-uv-progress-report/</link>
		<comments>http://www.jamesrlindsay.com/2010/12/sgi-altix-uv-progress-report/#comments</comments>
		<pubDate>Thu, 16 Dec 2010 15:43:27 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[sgi altix]]></category>

		<guid isPermaLink="false">http://www.jamesrlindsay.com/?p=46</guid>
		<description><![CDATA[After a semester of working with the computer, I am extremely happy with it. Really the astonishing thing about the machine is that there is no difference between it, and the desktop PC I&#8217;m using to write this post on. &#8230; <a href="http://www.jamesrlindsay.com/2010/12/sgi-altix-uv-progress-report/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>After a semester of working with the computer, I am extremely happy with it. Really the astonishing thing about the machine is that there is no difference between it, and the desktop PC I&#8217;m using to write this post on. The same code that I write, or execute here will work on the machine. I get amazing performance in Bowtie, BWA and other threaded alignment tools. <div id="attachment_48" class="wp-caption alignleft" style="width: 310px"><a href="http://www.jamesrlindsay.com/wp-content/uploads/2010/12/sysinfo.jpg"><img src="http://www.jamesrlindsay.com/wp-content/uploads/2010/12/sysinfo-300x262.jpg" alt="Gnome view of SGI Altix" title="sysinfo" width="300" height="262" class="size-medium wp-image-48" /></a><p class="wp-caption-text">SGI Altix UV 100</p></div></p>
]]></content:encoded>
			<wfw:commentRss>http://www.jamesrlindsay.com/2010/12/sgi-altix-uv-progress-report/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>HTS Analysis with SGI</title>
		<link>http://www.jamesrlindsay.com/2010/08/hts-analysis-with-sgi/</link>
		<comments>http://www.jamesrlindsay.com/2010/08/hts-analysis-with-sgi/#comments</comments>
		<pubDate>Thu, 12 Aug 2010 19:28:41 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[announcements]]></category>
		<category><![CDATA[life in the time of shared memory]]></category>
		<category><![CDATA[sgi altix]]></category>
		<category><![CDATA[altix uv 100]]></category>
		<category><![CDATA[SGI]]></category>
		<category><![CDATA[shared memory]]></category>

		<guid isPermaLink="false">http://www.jamesrlindsay.com/?p=40</guid>
		<description><![CDATA[Intro post describing forthcoming series of articles regarding my experience with a large shared memory system (Altxi UV 100). <a href="http://www.jamesrlindsay.com/2010/08/hts-analysis-with-sgi/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div id="attachment_41" class="wp-caption alignleft" style="width: 310px"><a href="http://www.jamesrlindsay.com/wp-content/uploads/2010/08/sgi.jpg"><img class="size-medium wp-image-41 " title="SGI Altix UV 100" src="http://www.jamesrlindsay.com/wp-content/uploads/2010/08/sgi-300x224.jpg" alt="Tha Beast" width="300" height="224" /></a><p class="wp-caption-text">Uncrating and bringing into the building.</p></div>
<p>Stay tuned for a series of updates describing my experience with using an SGI Altix UV 100 with 512GB RAM, 48 Cores to process and analyze sequencing data generated on 454, SoliD and Illumina platforms. Check out this picture of our lab manager and myself taking the computer for a joy ride around campus&#8230; Or rather removing it from its packaging to fit it through the buildings doors. In subsequent posts I will describe the whole process of choosing, obtaining and using this machine to support data analysis and bioinformatics algorithm development.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jamesrlindsay.com/2010/08/hts-analysis-with-sgi/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RepeatMasker on the cluster</title>
		<link>http://www.jamesrlindsay.com/2010/07/repeatmasker-on-the-cluster/</link>
		<comments>http://www.jamesrlindsay.com/2010/07/repeatmasker-on-the-cluster/#comments</comments>
		<pubDate>Mon, 12 Jul 2010 18:09:04 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[announcements]]></category>

		<guid isPermaLink="false">http://www.jamesrlindsay.com/?p=33</guid>
		<description><![CDATA[Many organisms genomes contain a high percentage of so called &#8220;repetitive elements&#8221;, and the study of these elements is a very active area of research. After a research group has created a draft assembly of some organisms genome, usually the &#8230; <a href="http://www.jamesrlindsay.com/2010/07/repeatmasker-on-the-cluster/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Many organisms genomes contain a high percentage of so called &#8220;repetitive elements&#8221;, and the study of these elements is a very active area of research. After a research group has created a draft assembly of some organisms genome, usually the next step is to start annotating various genomic features such as genes and repetitive elements. One tool, RepeatMasker, by the <a href="http://www.systemsbiology.org/">Institute for System Biology</a> has emerged as a defacto standard in de novo, and database repeat identification and classification.</p>
<p>I&#8217;ve added some more code to the <a href="http://github.com/eljimbo/NextGenScripts" target="_self">NextGenScripts</a> page, one little script helps split fasta formatted files into smaller pieces. The other is a script to run <a href="http://www.repeatmasker.org/">RepeatMasker </a>on a cluster, this speeds up the programs execution time greatly. None of my code mentions how to install or work with RepeatMasker, so please follow the above link to install the software first.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jamesrlindsay.com/2010/07/repeatmasker-on-the-cluster/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

