RAPSearch: a fast protein similarity search tool for short reads. Ye, Y., Choi, J., & Tang, H.
RAPSearch: a fast protein similarity search tool for short reads [pdf]Paper  RAPSearch: a fast protein similarity search tool for short reads [link]Website  abstract   bibtex   
Background: Next Generation Sequencing (NGS) is producing enormous corpuses of short DNA reads, affecting emerging fields like metagenomics. Protein similarity search–a key step to achieve annotation of protein-coding genes in these short reads, and identification of their biological functions–faces daunting challenges because of the very sizes of the short read datasets. Results: We developed a fast protein similarity search tool RAPSearch that utilizes a reduced amino acid alphabet and suffix array to detect seeds of flexible length. For short reads (translated in 6 frames) we tested, RAPSearch achieved ~20-90 times speedup as compared to BLASTX. RAPSearch missed only a small fraction (~1.3-3.2%) of BLASTX similarity hits, but it also discovered additional homologous proteins (~0.3-2.1%) that BLASTX missed. By contrast, BLAT, a tool that is even slightly faster than RAPSearch, had significant loss of sensitivity as compared to RAPSearch and BLAST.

Downloads: 0