4.8 Article

MetaDecoder: a novel method for clustering metagenomic contigs

期刊

MICROBIOME
卷 10, 期 1, 页码 -

出版社

BMC
DOI: 10.1186/s40168-022-01237-8

关键词

MetaDecoder; Clustering algorithm; Metagenome; DPGMM; GMM

资金

  1. National Natural Science Foundation of China [32170616, 82170896, 31970569, 31871264]
  2. Natural Science Basic Research Program of Shaanxi Province [2021JC-02]
  3. special guidance funds for the construction of world-class universities (disciplines)
  4. characteristic development in central universities

向作者/读者索取更多资源

In this study, a novel clustering algorithm called MetaDecoder was introduced, which can classify metagenomic contigs based on the frequencies of k-mers and coverages. Benchmark tests on simulated and real-world datasets demonstrated that MetaDecoder can effectively cluster metagenomic contigs and has the potential to be a promising approach.
Background: Clustering the metagenomic contigs into potential genomes is a key step to investigate the functional roles of microbial populations. Existing algorithms have achieved considerable success with simulated or real sequencing datasets. However, accurately classifying contigs from complex metagenomes is still a challenge. Results: We introduced a novel clustering algorithm, MetaDecoder, which can classify metagenomic contigs based on the frequencies of k-mers and coverages. MetaDecoder was built as a two-layer model with the first layer being a GPU-based modified Dirichlet process Gaussian mixture model (DPGMM), which controls the weight of each DPGMM cluster to avoid over-segmentation by dynamically dissolving contigs in small clusters and reassigning them to the remaining clusters. The second layer comprises a semi-supervised k-mer frequency probabilistic model and a modified Gaussian mixture model for modeling the coverage based on single copy marker genes. Benchmarks on simulated and real-world datasets demonstrated that MetaDecoder can be served as a promising approach for effectively clustering metagenomic contigs. Conclusions: In conclusion, we developed the GPU-based MetaDecoder for effectively clustering metagenomic contigs and reconstructing microbial communities from microbial data. Applying MetaDecoder on both simulated and real-world datasets demonstrated that it could generate more complete clusters with lower contamination. Using MetaDecoder, we identified novel high-quality genomes and expanded the existing catalog of bacterial genomes.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据