Many organisms genomes contain a high percentage of so called “repetitive elements”, and the study of these elements is a very active area of research. After a research group has created a draft assembly of some organisms genome, usually the next step is to start annotating various genomic features such as genes and repetitive elements. One tool, RepeatMasker, by the Institute for System Biology has emerged as a defacto standard in de novo, and database repeat identification and classification.
I’ve added some more code to the NextGenScripts page, one little script helps split fasta formatted files into smaller pieces. The other is a script to run RepeatMasker on a cluster, this speeds up the programs execution time greatly. None of my code mentions how to install or work with RepeatMasker, so please follow the above link to install the software first.