4.7 Article Proceedings Paper

hc-OTU: A Fast and Accurate Method for Clustering Operational Taxonomic Units Based on Homopolymer Compaction

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TCBB.2016.2535326

关键词

Clustering algorithm; operational taxonomic unit (OTU); pyrosequencing; metagenomics; 16s rRNA

资金

  1. National Research Foundation of Korea (NRF) - Korea government (Ministry of Science, ICT and Future Planning, MSIP) [2011-0009963, 2014M3C9A3063541]
  2. Industrial Core Technology Development Program [10040176]
  3. Ministry of Trade, Industry and Energy (MOTIE, Korea)
  4. Samsung Electronics Co., Ltd.
  5. NRF - Korea government (MSIP) [2013R1A1A1057949, 2014R1A4A1007895]
  6. National Research Foundation of Korea [2014M3C9A3063541, 22A20151713442, 2014R1A4A1007895, 2013R1A1A1057949] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

向作者/读者索取更多资源

To assess the genetic diversity of an environmental sample in metagenomics studies, the amplicon sequences of 16s rRNA genes need to be clustered into operational taxonomic units (OTUs). Many existing tools for OTU clustering trade off between accuracy and computational efficiency. We propose a novel OTU clustering algorithm, hc-OTU, which achieves high accuracy and fast runtime by exploiting homopolymer compaction and k-mer profiling to significantly reduce the computing time for pairwise distances of amplicon sequences. We compare the proposed method with other widely used methods, including UCLUST, CD-HIT, MOTHUR, ESPRIT, ESPRIT-TREE, and CLUSTOM, comprehensively, using nine different experimental datasets and many evaluation metrics, such as normalized mutual information, adjusted Rand index, measure of concordance, and F-score. Our evaluation reveals that the proposed method achieves a level of accuracy comparable to the respective accuracy levels of MOTHUR and ESPRIT-TREE, two widely used OTU clustering methods, while delivering orders-of-magnitude speedups.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据