De novo identification of repeat families in large genomes. Price, A. L., Jones, N. C., & Pevzner, P. A. Bioinformatics (Oxford, England), 21 Suppl 1:i351–358, June, 2005.
doi  abstract   bibtex   
MOTIVATION: De novo repeat family identification is a challenging algorithmic problem of great practical importance. As the number of genome sequencing projects increases, there is a pressing need to identify the repeat families present in large, newly sequenced genomes. We develop a new method for de novo identification of repeat families via extension of consensus seeds; our method enables a rigorous definition of repeat boundaries, a key issue in repeat analysis. RESULTS: Our RepeatScout algorithm is more sensitive and is orders of magnitude faster than RECON, the dominant tool for de novo repeat family identification in newly sequenced genomes. Using RepeatScout, we estimate that approximately 2% of the human genome and 4% of mouse and rat genomes consist of previously unannotated repetitive sequence. AVAILABILITY: Source code is available for download at http://www-cse.ucsd.edu/groups/bioinformatics/software.html
@article{price_novo_2005,
	title = {De novo identification of repeat families in large genomes},
	volume = {21 Suppl 1},
	issn = {1367-4803},
	doi = {10.1093/bioinformatics/bti1018},
	abstract = {MOTIVATION: De novo repeat family identification is a challenging algorithmic problem of great practical importance. As the number of genome sequencing projects increases, there is a pressing need to identify the repeat families present in large, newly sequenced genomes. We develop a new method for de novo identification of repeat families via extension of consensus seeds; our method enables a rigorous definition of repeat boundaries, a key issue in repeat analysis.
RESULTS: Our RepeatScout algorithm is more sensitive and is orders of magnitude faster than RECON, the dominant tool for de novo repeat family identification in newly sequenced genomes. Using RepeatScout, we estimate that approximately 2\% of the human genome and 4\% of mouse and rat genomes consist of previously unannotated repetitive sequence.
AVAILABILITY: Source code is available for download at http://www-cse.ucsd.edu/groups/bioinformatics/software.html},
	language = {eng},
	journal = {Bioinformatics (Oxford, England)},
	author = {Price, Alkes L. and Jones, Neil C. and Pevzner, Pavel A.},
	month = jun,
	year = {2005},
	pmid = {15961478},
	keywords = {Algorithms, Animals, Sequence Analysis, DNA, Sequence Alignment, Computational Biology, Genome, Internet, Models, Genetic, Caenorhabditis, Caenorhabditis elegans, Mice, Models, Statistical, Rats, Repetitive Sequences, Nucleic Acid},
	pages = {i351--358},
	file = {Texte intégral:C\:\\Users\\qcarrade\\Zotero\\storage\\AP83XJG9\\Price et al. - 2005 - De novo identification of repeat families in large.pdf:application/pdf},
}

Downloads: 0