What is a genome?
A genome is the sequence of DNA that makes up the genetic code of every living organism. In real life DNA is made up of simple chemical compounds bound together sequentially. In the abstract world of the computer DNA is represented as a string of characters (for example “ACGGTGCT”).
The genome is the underlying medium upon which all information about an organism is stored. There are many resources in books and on the web to help you gain a more complete understanding of the biological significance of a genome (wikipedia, ncbi, a tour). However the focus of these posts is not going to be about what the genome does, but rather about what the genome is and how we obtain and work with the genome on a computer.
Obtain a genome … IRL
When I mention that I work with genomes on a computer, most people think I’m crazy and don’t quite understand how I can work with something found inside a cell, on a Dell ( check out that alliteration). Fortunately there is a plethora of resources out there that can explain how we obtain a genome in real life.
Professor wikipedia is always a good start to begin to understand a new topic and the previous link provides a somewhat technical overview of DNA sequencing. The US government is heavily involved in many of the earliest and largest DNA sequencing projects and the website genome.gov provides a concise textual introduction to genome sequencing.
Now for a quick overview in my own words: The process of reading DNA from a cell and transfering that information to a computer is known as “sequencing”. The ability to sequence DNA has been around for almost 30 years, but the techniques and speed at which it is done has changes drastically since its inception.
In the 1990′s to early 2000s DNA sequencing was very expensive, and sequencing the genome of a large mammal (like the human) costs hundreds of millions of dollars. Several for profit companies began automating the processes and developing new techniques. Now it is possible to sequence the human genome for about 10-20 thousand dollars.
I’ve created a simple graphic to explain how sequencing is done. In essence its a 3 step process; first the DNA is isolated from a cell(s) and loaded into the machine. Next the machine uses some tricks to get each base in the DNA to emit a color and it takes a whole bunch of pictures. Finally the image files are processed and an algorithm figures out what bases they represent, the string of DNA bases is written to a file on a computer.
Summary
Thank you for reading the second post in my series dedicated to the computer and the genome. The reader should now have at least a basic understanding of what a genome is from a biological perspective (by following the provided links), and how DNA is sequenced and represented on a computer. The following post will go into how raw sequenced data is assembled into a genome.

