4.6 Article

RAPSearch: a fast protein similarity search tool for short reads

期刊

BMC BIOINFORMATICS
卷 12, 期 -, 页码 -

出版社

BIOMED CENTRAL LTD
DOI: 10.1186/1471-2105-12-159

关键词

short reads similarity search; suffix array; reduced amino acid alphabet; metagenomics

资金

  1. NIH [1R01HG004908]
  2. NSF [DBI-0845685]
  3. Div Of Biological Infrastructure
  4. Direct For Biological Sciences [0845685] Funding Source: National Science Foundation

向作者/读者索取更多资源

Background: Next Generation Sequencing (NGS) is producing enormous corpuses of short DNA reads, affecting emerging fields like metagenomics. Protein similarity search-a key step to achieve annotation of protein-coding genes in these short reads, and identification of their biological functions-faces daunting challenges because of the very sizes of the short read datasets. Results: We developed a fast protein similarity search tool RAPSearch that utilizes a reduced amino acid alphabet and suffix array to detect seeds of flexible length. For short reads (translated in 6 frames) we tested, RAPSearch achieved similar to 20-90 times speedup as compared to BLASTX. RAPSearch missed only a small fraction (similar to 1.3-3.2%) of BLASTX similarity hits, but it also discovered additional homologous proteins (similar to 0.3-2.1%) that BLASTX missed. By contrast, BLAT, a tool that is even slightly faster than RAPSearch, had significant loss of sensitivity as compared to RAPSearch and BLAST. Conclusions: RAPSearch is implemented as open-source software and is accessible at http://omics.informatics.indiana.edu/mg/RAPSearch. It enables faster protein similarity search. The application of RAPSearch in metageomics has also been demonstrated.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据