<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>James Lindsay&#039;s Portfolio</title>
	<atom:link href="http://www.jamesrlindsay.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.jamesrlindsay.com</link>
	<description>Computers and Genetics</description>
	<lastBuildDate>Mon, 30 Jan 2012 22:08:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Startup Weekend: Buses2</title>
		<link>http://www.jamesrlindsay.com/2012/01/startup-weekend-buses2/</link>
		<comments>http://www.jamesrlindsay.com/2012/01/startup-weekend-buses2/#comments</comments>
		<pubDate>Mon, 30 Jan 2012 22:08:24 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.jamesrlindsay.com/?p=110</guid>
		<description><![CDATA[I just participated in Startup Weekend Storrs, an event that brings together entrepreneurial developers, business folk and designers to try and start a business in 54 hours! The teams are chosen based on 60 seconds elevator pitches (no props), unfortunately &#8230; <a href="http://www.jamesrlindsay.com/2012/01/startup-weekend-buses2/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I just participated in <a title="startup weekend storrs" href="http://storrs.startupweekend.org/">Startup Weekend Storrs</a>, an event that brings together entrepreneurial developers, business folk and designers to try and start a business in 54 hours!</p>
<div id="attachment_111" class="wp-caption alignright" style="width: 310px"><a href="http://www.jamesrlindsay.com/wp-content/uploads/2012/01/startup_weekend.jpg"><img class="size-medium wp-image-111" title="Live QR code demo" src="http://www.jamesrlindsay.com/wp-content/uploads/2012/01/startup_weekend-300x225.jpg" alt="Thats me" width="300" height="225" /></a><p class="wp-caption-text">Me during the final presentation</p></div>
<p>The teams are chosen based on 60 seconds elevator pitches (no props), unfortunately I was unable to convey the awesomeness of my idea for a better public genetics database. Also unfortunately the team I signed up with was disqualified from placing because they had already made too much money (nice problem to have). Despite all that, the weekend was a great success and I look forward to doing it again.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jamesrlindsay.com/2012/01/startup-weekend-buses2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Genome Into the Computer</title>
		<link>http://www.jamesrlindsay.com/2011/10/the-genome/</link>
		<comments>http://www.jamesrlindsay.com/2011/10/the-genome/#comments</comments>
		<pubDate>Fri, 14 Oct 2011 06:01:58 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[assembly and scaffolding]]></category>
		<category><![CDATA[the computer and the genome]]></category>
		<category><![CDATA[assembly]]></category>
		<category><![CDATA[genome]]></category>
		<category><![CDATA[overview]]></category>
		<category><![CDATA[scaffolding]]></category>

		<guid isPermaLink="false">http://www.jamesrlindsay.com/?p=52</guid>
		<description><![CDATA[What is a genome? A genome is the sequence of DNA that makes up the genetic code of every living  organism. In real life DNA is made up of simple chemical compounds bound together sequentially. In the abstract world of &#8230; <a href="http://www.jamesrlindsay.com/2011/10/the-genome/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h1><span class="Apple-style-span" style="line-height: 47px; font-size: 28px; font-family: 'Helvetica Neue',Arial,Helvetica,'Nimbus Sans L',sans-serif;">What is a genome?</span></h1>
<p>A genome is the sequence of DNA that makes up the genetic code of every living  organism. In real life DNA is made up of simple chemical compounds bound together sequentially. In the abstract world of the computer DNA is represented as a <em>string</em> of characters (for example &#8220;ACGGTGCT&#8221;).</p>
<div id="attachment_67" class="wp-caption alignleft" style="width: 272px"><a href="http://www.jamesrlindsay.com/wp-content/uploads/2011/08/genome.gif"><img class="size-medium wp-image-67" title="genome" src="http://www.jamesrlindsay.com/wp-content/uploads/2011/08/genome-262x300.gif" alt="what is a genome" width="262" height="300" /></a><p class="wp-caption-text">simple view of a genome</p></div>
<p>The genome is the underlying medium upon which all information about an organism is stored. There are many resources in books and on the web to help you gain a more complete understanding of the biological significance of a genome (<a title="wikipedia" href="http://en.wikipedia.org/wiki/Genome" target="_blank">wikipedia</a>, <a title="ncbi" href="http://www.ncbi.nlm.nih.gov/About/primer/genetics_genome.html" target="_blank">ncbi</a>, <a title="tour of genome" href="http://www.dnai.org/c/" target="_blank">a tour</a>). However the focus of these posts is not going to be about what the genome does, but rather about what the genome is and how we obtain and work with the genome on a computer.</p>
<h2>Obtain a genome &#8230; IRL</h2>
<p>When I mention that I work with genomes on a computer, most people think I&#8217;m crazy and don&#8217;t quite understand how I can work with something found inside a cell, on a Dell ( check out that alliteration). Fortunately there is a plethora of resources out there that can explain how we obtain a genome in real life.</p>
<p><a title="wikipedia" href="http://en.wikipedia.org/wiki/DNA_sequencing" target="_blank">Professor wikipedia</a> is always a good start to begin to understand a new topic and the previous link provides a somewhat technical overview of DNA sequencing. The US government is heavily involved in many of the earliest and largest DNA sequencing projects and the website <a title="govt overview" href="http://www.genome.gov/10001177" target="_blank">genome.gov</a> provides a concise textual introduction to genome sequencing.</p>
<p>Now for a quick overview in my own words: The process of reading DNA from a cell and transfering that information to a computer is known as &#8220;sequencing&#8221;. The ability to sequence DNA has been around for almost 30 years, but the techniques and speed at which it is done has changes drastically since its inception.</p>
<p>In the 1990&#8242;s to early 2000s DNA sequencing was very expensive, and sequencing the genome of a large mammal (like the human) costs hundreds of millions of dollars. Several for profit companies began automating the processes and developing new techniques. Now it is possible to sequence the human genome for about 10-20 thousand dollars.</p>
<div id="attachment_81" class="wp-caption alignright" style="width: 310px"><a href="http://www.jamesrlindsay.com/wp-content/uploads/2011/08/sequencing_ink.png"><img class="size-medium wp-image-81" title="sequencing overview" src="http://www.jamesrlindsay.com/wp-content/uploads/2011/08/sequencing_ink-300x236.png" alt="overview of DNA sequencing" width="300" height="236" /></a><p class="wp-caption-text">DNA sequencing</p></div>
<p>&nbsp;</p>
<p>I&#8217;ve created a simple graphic to explain how sequencing is done. In essence its a 3 step process; first the DNA is isolated from a cell(s) and loaded into the machine. Next the machine uses some tricks to get each base in the DNA to emit a color and it takes a whole bunch of pictures. Finally the image files are processed and an algorithm figures out what bases they represent, the string of DNA bases is written to a file on a computer.</p>
<h2>Summary</h2>
<p>Thank you for reading the second post in my series dedicated to the computer and the genome.  The reader should now have at least a basic understanding of what a genome is from a biological perspective (by following the provided links), and how DNA is sequenced and represented on a computer. The following post will go into how raw sequenced data is assembled into a genome.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jamesrlindsay.com/2011/10/the-genome/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Genome on a Computer</title>
		<link>http://www.jamesrlindsay.com/2011/09/genome-on-a-computer/</link>
		<comments>http://www.jamesrlindsay.com/2011/09/genome-on-a-computer/#comments</comments>
		<pubDate>Tue, 20 Sep 2011 20:03:44 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[assembly and scaffolding]]></category>
		<category><![CDATA[the computer and the genome]]></category>
		<category><![CDATA[genome]]></category>

		<guid isPermaLink="false">http://www.jamesrlindsay.com/?p=79</guid>
		<description><![CDATA[A genome on a computer For the purpose of this and subsequent posts we will build a hypothetical genome, using very simple assumptions which we will change as needed: genome is only made up of the following characters A,C,G,T (base &#8230; <a href="http://www.jamesrlindsay.com/2011/09/genome-on-a-computer/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h2>A genome on a computer</h2>
<p>For the purpose of this and subsequent posts we will build a hypothetical genome, using very simple assumptions which we will change as needed:</p>
<div id="attachment_96" class="wp-caption alignright" style="width: 310px"><a href="http://www.jamesrlindsay.com/wp-content/uploads/2011/09/genome_ncbi.png"><img class="size-medium wp-image-96" title="genome_ncbi" src="http://www.jamesrlindsay.com/wp-content/uploads/2011/09/genome_ncbi-300x246.png" alt="NCBI Entry" width="300" height="246" /></a><p class="wp-caption-text">genome from ncbi</p></div>
<ul>
<li>genome is only made up of the following characters A,C,G,T (base pairs)</li>
<li>the genome is 10-mer unique</li>
<li>it has a length of 100 base pairs</li>
</ul>
<p>The first assumption is relatively benign and is made by many of the most complex tools used in computational genomics. The third assumption is made simply because 100 characters is a convenient amount to work with on a blog (the human genome is &gt; 3 gigabases, that 3 with 9 zeros) .</p>
<p>The second assumption is slightly more advanced and is the first sighting of a very challenging aspect of working with genomes. We will have to use some jargon to discuss it; <em>k-mer</em> is used to describe any sequence of k bases. So therefore the second assumption simply means that there is no 10 sequential bases in the genome that are the same as any other 10.</p>
<h3>example</h3>
<p><em>Through these posts I will try to include as many visual examples as possible. Many of them will be using python, I will post all information necessary to recreate the examples on your own.</em></p>
<p>This first example will give the reader a visualisation of what a genome is on a computer. First I will define an arbitrary genome sequence (<a href="http://www.ncbi.nlm.nih.gov/nuccore/AC245595.1">taken from here</a> for those who are interested)</p>
<pre class="brush:python">genome = "AGACAGACATAGGAGATTGCTGTAGAAACAAAAATATACGAGTATAATATTGCATAAATTAGGGTGTGCACAAAATATCAGAGAGATGAGCTGGCAACA"</pre>
<p>I claimed the genome was 10-mer unique and the following code should demonstrate that.</p>
<pre class="brush:python">for x in range(0, len(genome) - 9):
     print genome[x:x+10]</pre>
<p>The output of that script is below and you can visually inspect it to be sure that there are no 10-mer duplicates.</p>
<pre class="brush:python">AGACAGACAT
GACAGACATA
ACAGACATAG
CAGACATAGG
AGACATAGGA
GACATAGGAG
ACATAGGAGA
CATAGGAGAT
ATAGGAGATT
TAGGAGATTG
AGGAGATTGC
GGAGATTGCT
GAGATTGCTG
AGATTGCTGT
GATTGCTGTA
ATTGCTGTAG
TTGCTGTAGA
TGCTGTAGAA
GCTGTAGAAA
CTGTAGAAAC
TGTAGAAACA
GTAGAAACAA
TAGAAACAAA
AGAAACAAAA
GAAACAAAAA
AAACAAAAAT
AACAAAAATA
ACAAAAATAT
CAAAAATATA
AAAAATATAC
AAAATATACG
AAATATACGA
AATATACGAG
ATATACGAGT
TATACGAGTA
ATACGAGTAT
TACGAGTATA
ACGAGTATAA
CGAGTATAAT
GAGTATAATA
AGTATAATAT
GTATAATATT
TATAATATTG
ATAATATTGC
TAATATTGCA
AATATTGCAT
ATATTGCATA
TATTGCATAA
ATTGCATAAA
TTGCATAAAT
TGCATAAATT
GCATAAATTA
CATAAATTAG
ATAAATTAGG
TAAATTAGGG
AAATTAGGGT
AATTAGGGTG
ATTAGGGTGT
TTAGGGTGTG
TAGGGTGTGC
AGGGTGTGCA
GGGTGTGCAC
GGTGTGCACA
GTGTGCACAA
TGTGCACAAA
GTGCACAAAA
TGCACAAAAT
GCACAAAATA
CACAAAATAT
ACAAAATATC
CAAAATATCA
AAAATATCAG
AAATATCAGA
AATATCAGAG
ATATCAGAGA
TATCAGAGAG
ATCAGAGAGA
TCAGAGAGAT
CAGAGAGATG
AGAGAGATGA
GAGAGATGAG
AGAGATGAGC
GAGATGAGCT
AGATGAGCTG
GATGAGCTGG
ATGAGCTGGC
TGAGCTGGCA
GAGCTGGCAA
AGCTGGCAAC
GCTGGCAACA</pre>
<p>We can verify programatically that the example genome is indeed 10 unique by the following code;</p>
<pre class="brush:python"># save the genome to a variable.
genome = "AGACAGACATAGGAGATTGCTGTAGAAACAAAAATATACGAGTATAATATTGCATAAATTAGGGTGTGCACAAAATATCAGAGAGATGAGCTGGCAACA"

# make dictionary to track kmers.
kmer_cnt = {}

# loop over every kmer.
for x in range(0, len(genome) - 9):
	# get kmer.
	kmer = genome[x:x+10]

	# add to dictionary
	if kmer not in kmer_cnt:
		kmer_cnt[kmer] = 0
	kmer_cnt[kmer] += 1

# count number of kmers that appeared more than once.
cnt = 0
for kmer in kmer_cnt:
	if kmer_cnt[kmer] &gt; 1:
		cnt += 1

# report count.
print "there was %i repetative kmers" % cnt</pre>
<h2>Summary</h2>
<p>This post should give you an idea of what a genome is from a computational perspective. Basically on a computer we represent a genome by a string of A,C,G,T characters, and our current working example is 10-mer unique, so no consecutive 10 characters will ever be <strong>repeated</strong>.</p>
<p>The next post will discuss how in real life a genome can be obtained using a genome sequencer and how we work with that information on a computer.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jamesrlindsay.com/2011/09/genome-on-a-computer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Computer and The Genome</title>
		<link>http://www.jamesrlindsay.com/2011/08/the-computer-and-the-genome/</link>
		<comments>http://www.jamesrlindsay.com/2011/08/the-computer-and-the-genome/#comments</comments>
		<pubDate>Wed, 31 Aug 2011 19:23:23 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[announcements]]></category>
		<category><![CDATA[the computer and the genome]]></category>

		<guid isPermaLink="false">http://www.jamesrlindsay.com/?p=88</guid>
		<description><![CDATA[Introduction Welcome to the introductory post of a series I will be writing which looks at what a genome is (on a computer) and the computational issues associated with obtaining and publishing them. This is the main focus of my &#8230; <a href="http://www.jamesrlindsay.com/2011/08/the-computer-and-the-genome/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>
<p>Welcome to the introductory post of a series I will be writing which looks at what a genome is (on a computer) and the computational issues associated with obtaining and publishing them. This is the main focus of my research towards my PhD and I will continue to update and expand on these posts as I learn more.</p>
<p>This and subsequent posts will be directed toward a non-technical audience with little background in computer science and biology, therefore you will be spared nuanced points, and arcane vocabulary as much as possible. However I will post bits of Python code which I will use to demonstrate points and readers are encouraged to follow along (python is found on basically all computers).  I welcome comments, clarifications and contributions to this article. Please register to post a comment or send me an email.</p>
<p>This series is inspired by a short book <a title="the computer and the brain" href="http://www.leydesdorff.net/vonneumann/"><em>The Computer and The Brain</em></a> by mathematician John von Neumann.</p>
<div id="attachment_89" class="wp-caption alignleft" style="width: 210px"><a href="http://www.jamesrlindsay.com/wp-content/uploads/2011/08/200px-JohnvonNeumann-LosAlamos.gif"><img class="size-full wp-image-89" title="200px-JohnvonNeumann-LosAlamos" src="http://www.jamesrlindsay.com/wp-content/uploads/2011/08/200px-JohnvonNeumann-LosAlamos.gif" alt="John Von Neumann" width="200" height="260" /></a><p class="wp-caption-text">John von Neumann</p></div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jamesrlindsay.com/2011/08/the-computer-and-the-genome/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tammar Genome Published &#8230; finally</title>
		<link>http://www.jamesrlindsay.com/2011/08/tammar-genome-published/</link>
		<comments>http://www.jamesrlindsay.com/2011/08/tammar-genome-published/#comments</comments>
		<pubDate>Mon, 22 Aug 2011 17:41:25 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[announcements]]></category>
		<category><![CDATA[genome]]></category>
		<category><![CDATA[tammar]]></category>
		<category><![CDATA[wallaby]]></category>

		<guid isPermaLink="false">http://www.jamesrlindsay.com/?p=60</guid>
		<description><![CDATA[Appologies for not updating the blog recently, however I want to point out that the Tammar Genome has finally been published in Genome Biology! I contributed to the effort by developing a genome improvement algorithm to incorporate new sequencing data, and analyzing &#8230; <a href="http://www.jamesrlindsay.com/2011/08/tammar-genome-published/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Appologies for not updating the blog recently, however I want to point out that the Tammar Genome has finally been <a title="tammar genome paper" href="http://genomebiology.com/2011/12/8/R81/abstract">published in Genome Biology</a>! I contributed to the effort by developing a genome improvement algorithm to incorporate new sequencing data, and analyzing the repeats and small rna.</p>
<p>There is a <a title="tammar genome" href="http://blogs.nature.com/news/2011/08/wallaby_genome_hops.html">nature blog post</a> , and an <a title="tammar genome australia" href="http://www.lifescientist.com.au/article/397877/strewth_first_wallaby_genome_published/">Australian news article</a> which gives a brief overview and links to other overviews. I will be starting a series of blog posts which should highlight my work for the tammar genome and give some insight into genome assembly and scaffolding in general.</p>
<div id="attachment_63" class="wp-caption alignleft" style="width: 310px"><a href="http://www.jamesrlindsay.com/wp-content/uploads/2011/08/tammar.jpg"><img class="size-medium wp-image-63" title="tammar wallaby" src="http://www.jamesrlindsay.com/wp-content/uploads/2011/08/tammar-300x211.jpg" alt="about ready to run for it" width="300" height="211" /></a><p class="wp-caption-text">meow look at my genome</p></div>
]]></content:encoded>
			<wfw:commentRss>http://www.jamesrlindsay.com/2011/08/tammar-genome-published/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SGI Altix UV progress report</title>
		<link>http://www.jamesrlindsay.com/2010/12/sgi-altix-uv-progress-report/</link>
		<comments>http://www.jamesrlindsay.com/2010/12/sgi-altix-uv-progress-report/#comments</comments>
		<pubDate>Thu, 16 Dec 2010 15:43:27 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[sgi altix]]></category>

		<guid isPermaLink="false">http://www.jamesrlindsay.com/?p=46</guid>
		<description><![CDATA[After a semester of working with the computer, I am extremely happy with it. Really the astonishing thing about the machine is that there is no difference between it, and the desktop PC I&#8217;m using to write this post on. &#8230; <a href="http://www.jamesrlindsay.com/2010/12/sgi-altix-uv-progress-report/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>After a semester of working with the computer, I am extremely happy with it. Really the astonishing thing about the machine is that there is no difference between it, and the desktop PC I&#8217;m using to write this post on. The same code that I write, or execute here will work on the machine. I get amazing performance in Bowtie, BWA and other threaded alignment tools. <div id="attachment_48" class="wp-caption alignleft" style="width: 310px"><a href="http://www.jamesrlindsay.com/wp-content/uploads/2010/12/sysinfo.jpg"><img src="http://www.jamesrlindsay.com/wp-content/uploads/2010/12/sysinfo-300x262.jpg" alt="Gnome view of SGI Altix" title="sysinfo" width="300" height="262" class="size-medium wp-image-48" /></a><p class="wp-caption-text">SGI Altix UV 100</p></div></p>
]]></content:encoded>
			<wfw:commentRss>http://www.jamesrlindsay.com/2010/12/sgi-altix-uv-progress-report/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>HTS Analysis with SGI</title>
		<link>http://www.jamesrlindsay.com/2010/08/hts-analysis-with-sgi/</link>
		<comments>http://www.jamesrlindsay.com/2010/08/hts-analysis-with-sgi/#comments</comments>
		<pubDate>Thu, 12 Aug 2010 19:28:41 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[announcements]]></category>
		<category><![CDATA[life in the time of shared memory]]></category>
		<category><![CDATA[sgi altix]]></category>
		<category><![CDATA[altix uv 100]]></category>
		<category><![CDATA[SGI]]></category>
		<category><![CDATA[shared memory]]></category>

		<guid isPermaLink="false">http://www.jamesrlindsay.com/?p=40</guid>
		<description><![CDATA[Intro post describing forthcoming series of articles regarding my experience with a large shared memory system (Altxi UV 100). <a href="http://www.jamesrlindsay.com/2010/08/hts-analysis-with-sgi/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div id="attachment_41" class="wp-caption alignleft" style="width: 310px"><a href="http://www.jamesrlindsay.com/wp-content/uploads/2010/08/sgi.jpg"><img class="size-medium wp-image-41 " title="SGI Altix UV 100" src="http://www.jamesrlindsay.com/wp-content/uploads/2010/08/sgi-300x224.jpg" alt="Tha Beast" width="300" height="224" /></a><p class="wp-caption-text">Uncrating and bringing into the building.</p></div>
<p>Stay tuned for a series of updates describing my experience with using an SGI Altix UV 100 with 512GB RAM, 48 Cores to process and analyze sequencing data generated on 454, SoliD and Illumina platforms. Check out this picture of our lab manager and myself taking the computer for a joy ride around campus&#8230; Or rather removing it from its packaging to fit it through the buildings doors. In subsequent posts I will describe the whole process of choosing, obtaining and using this machine to support data analysis and bioinformatics algorithm development.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jamesrlindsay.com/2010/08/hts-analysis-with-sgi/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RepeatMasker on the cluster</title>
		<link>http://www.jamesrlindsay.com/2010/07/repeatmasker-on-the-cluster/</link>
		<comments>http://www.jamesrlindsay.com/2010/07/repeatmasker-on-the-cluster/#comments</comments>
		<pubDate>Mon, 12 Jul 2010 18:09:04 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[announcements]]></category>

		<guid isPermaLink="false">http://www.jamesrlindsay.com/?p=33</guid>
		<description><![CDATA[Many organisms genomes contain a high percentage of so called &#8220;repetitive elements&#8221;, and the study of these elements is a very active area of research. After a research group has created a draft assembly of some organisms genome, usually the &#8230; <a href="http://www.jamesrlindsay.com/2010/07/repeatmasker-on-the-cluster/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Many organisms genomes contain a high percentage of so called &#8220;repetitive elements&#8221;, and the study of these elements is a very active area of research. After a research group has created a draft assembly of some organisms genome, usually the next step is to start annotating various genomic features such as genes and repetitive elements. One tool, RepeatMasker, by the <a href="http://www.systemsbiology.org/">Institute for System Biology</a> has emerged as a defacto standard in de novo, and database repeat identification and classification.</p>
<p>I&#8217;ve added some more code to the <a href="http://github.com/eljimbo/NextGenScripts" target="_self">NextGenScripts</a> page, one little script helps split fasta formatted files into smaller pieces. The other is a script to run <a href="http://www.repeatmasker.org/">RepeatMasker </a>on a cluster, this speeds up the programs execution time greatly. None of my code mentions how to install or work with RepeatMasker, so please follow the above link to install the software first.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jamesrlindsay.com/2010/07/repeatmasker-on-the-cluster/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Preferred Tools and NextGenScripts</title>
		<link>http://www.jamesrlindsay.com/2010/07/preferred-tools-and-nextgenscripts/</link>
		<comments>http://www.jamesrlindsay.com/2010/07/preferred-tools-and-nextgenscripts/#comments</comments>
		<pubDate>Thu, 01 Jul 2010 00:15:19 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[announcements]]></category>

		<guid isPermaLink="false">http://www.jamesrlindsay.com/?p=29</guid>
		<description><![CDATA[I added a new page titled Preferred Tools which is nothing more than a list of tasks and the tools I use to accomplish them. I hope it is useful to people out there. Secondly I started adding some code &#8230; <a href="http://www.jamesrlindsay.com/2010/07/preferred-tools-and-nextgenscripts/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I added a new page titled Preferred Tools which is nothing more than a list of tasks and the tools I use to accomplish them. I hope it is useful to people out there. Secondly I started adding some code to my repositories. You can check out both pages by clicking links in the right hand menu.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jamesrlindsay.com/2010/07/preferred-tools-and-nextgenscripts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Download Whole Genomes Quickly from NCBI</title>
		<link>http://www.jamesrlindsay.com/2010/06/download-whole-genomes-quickly-from-ncbi/</link>
		<comments>http://www.jamesrlindsay.com/2010/06/download-whole-genomes-quickly-from-ncbi/#comments</comments>
		<pubDate>Wed, 30 Jun 2010 22:04:10 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[tips and tricks]]></category>

		<guid isPermaLink="false">http://www.jamesrlindsay.com/demo/?p=16</guid>
		<description><![CDATA[Here is a quick command line script that should work on any linux system with wget. This command will download chromosomes 1- 8 for the possum. You would need to modify the list of numbers to be &#8220;X Y M &#8230; <a href="http://www.jamesrlindsay.com/2010/06/download-whole-genomes-quickly-from-ncbi/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Here is a quick command line script that should work on any linux system with wget. This command will download chromosomes 1- 8 for the possum. You would need to modify the list of numbers to be &#8220;X Y M Un&#8221; to get the non-numeric chromosomes.</p>
<p><code><br />
for i in 01 02 03 04 05 06 07 08; do wget "ftp://ftp.ncbi.nih.gov/genomes/Monodelphis_domestica/CHR_${i}/mdm_ref_chr${i}.fa.gz"; done;</code></p>
<p>You can do this for any species or really any organized FTP site. Here is the link to the<a title="NCBI genome" href="ftp://ftp.ncbi.nih.gov/genomes" target="_self"> NCBI genome</a> ftp site.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jamesrlindsay.com/2010/06/download-whole-genomes-quickly-from-ncbi/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

