4.7 Article

CD-HIT: accelerated for clustering the next-generation sequencing data

期刊

BIOINFORMATICS
卷 28, 期 23, 页码 3150-3152

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/bts565

关键词

-

资金

  1. National Institute of Health from the National Center for Research Resources [R01RR025030]
  2. National Human Genome Research Institute [R01HG005978]

向作者/读者索取更多资源

CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to similar to 24 cores and a quasi-linear speedup for up to similar to 8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据