Category Archives: announcements

The Computer and The Genome

Introduction

Welcome to the introductory post of a series I will be writing which looks at what a genome is (on a computer) and the computational issues associated with obtaining and publishing them. This is the main focus of my research towards my PhD and I will continue to update and expand on these posts as I learn more.

This and subsequent posts will be directed toward a non-technical audience with little background in computer science and biology, therefore you will be spared nuanced points, and arcane vocabulary as much as possible. However I will post bits of Python code which I will use to demonstrate points and readers are encouraged to follow along (python is found on basically all computers).  I welcome comments, clarifications and contributions to this article. Please register to post a comment or send me an email.

This series is inspired by a short book The Computer and The Brain by mathematician John von Neumann.

John Von Neumann

John von Neumann

 

Tammar Genome Published … finally

Appologies for not updating the blog recently, however I want to point out that the Tammar Genome has finally been published in Genome Biology! I contributed to the effort by developing a genome improvement algorithm to incorporate new sequencing data, and analyzing the repeats and small rna.

There is a nature blog post , and an Australian news article which gives a brief overview and links to other overviews. I will be starting a series of blog posts which should highlight my work for the tammar genome and give some insight into genome assembly and scaffolding in general.

about ready to run for it

meow look at my genome

HTS Analysis with SGI

Tha Beast

Uncrating and bringing into the building.

Stay tuned for a series of updates describing my experience with using an SGI Altix UV 100 with 512GB RAM, 48 Cores to process and analyze sequencing data generated on 454, SoliD and Illumina platforms. Check out this picture of our lab manager and myself taking the computer for a joy ride around campus… Or rather removing it from its packaging to fit it through the buildings doors. In subsequent posts I will describe the whole process of choosing, obtaining and using this machine to support data analysis and bioinformatics algorithm development.

RepeatMasker on the cluster

Many organisms genomes contain a high percentage of so called “repetitive elements”, and the study of these elements is a very active area of research. After a research group has created a draft assembly of some organisms genome, usually the next step is to start annotating various genomic features such as genes and repetitive elements. One tool, RepeatMasker, by the Institute for System Biology has emerged as a defacto standard in de novo, and database repeat identification and classification.

I’ve added some more code to the NextGenScripts page, one little script helps split fasta formatted files into smaller pieces. The other is a script to run RepeatMasker on a cluster, this speeds up the programs execution time greatly. None of my code mentions how to install or work with RepeatMasker, so please follow the above link to install the software first.

Preferred Tools and NextGenScripts

I added a new page titled Preferred Tools which is nothing more than a list of tasks and the tools I use to accomplish them. I hope it is useful to people out there. Secondly I started adding some code to my repositories. You can check out both pages by clicking links in the right hand menu.

Welcome

Welcome to my updated portfolio. Please stay tuned as I add information about my current work, publications and even some code.