4.7 Article

CD-HIT: accelerated for clustering the next-generation sequencing data

Journal

BIOINFORMATICS
Volume 28, Issue 23, Pages 3150-3152

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/bts565

Keywords

-

Funding

  1. National Institute of Health from the National Center for Research Resources [R01RR025030]
  2. National Human Genome Research Institute [R01HG005978]

Ask authors/readers for more resources

CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to similar to 24 cores and a quasi-linear speedup for up to similar to 8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available