Bioinformatics Advance Access originally published online on June 9, 2008
Bioinformatics 2008 24(15):1676-1680; doi:10.1093/bioinformatics/btn283
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sequence-specific reconstruction from fragmentary databases using seed sequences: implementation and validation on SAGE, proteome and generic sequencing data
1Instituto do Coração - USP, Av. Prof. Enéas de Carvalho Aguiar 44, Bloco 2, 10° andar, São Paulo SP, 05403-000, Brazil and 2Departamento de Parasitologia, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo SP, 05508-000, Brazil
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: DNA assembly programs classically perform an all-against-all comparison of reads to identify overlaps, followed by a multiple sequence alignment and generation of a consensus sequence. If the aim is to assemble a particular segment, instead of a whole genome or transcriptome, a target-specific assembly is a more sensible approach. GenSeed is a Perl program that implements a seed-driven recursive assembly consisting of cycles comprising a similarity search, read selection and assembly. The iterative process results in a progressive extension of the original seed sequence. GenSeed was tested and validated on many applications, including the reconstruction of nuclear genes or segments, full-length transcripts, and extrachromosomal genomes. The robustness of the method was confirmed through the use of a variety of DNA and protein seeds, including short sequences derived from SAGE and proteome projects.
Availability: GenSeed is available under the GNU General Public License at http://www.coccidia.icb.usp.br/genseed/
Contact: argruber{at}usp.br
Supplementary information: Supplementary data are available at http://www.coccidia.icb.usp.br/genseed/
Associate Editor: John Quackenbush
Received on December 13, 2007; revised on May 25, 2008; accepted on June 8, 2008