☆ 4.6 Article

Large-scale k-mer-based analysis of the informational properties of genomes, comparative genomics and taxonomy

PLOS ONE (2021)

期刊

PLOS ONE

卷 16, 期 10, 页码 -

出版社

PUBLIC LIBRARY SCIENCE

DOI: 10.1371/journal.pone.0258693

关键词

类别

Multidisciplinary Sciences

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Information theoretic approaches are commonly used in bioinformatics, particularly in comparative genomics where alignment-free methods using short DNA words (k-mers) are powerful. An analysis of 5805 genomes revealed that 21-mer and 31-mer Jaccard similarities can accurately recapitulate the phylogenetic tree of life, emphasizing the importance of whole-genome similarity for taxonomic classifications.

Information theoretic approaches are ubiquitous and effective in a wide variety of bioinformatics applications. In comparative genomics, alignment-free methods, based on short DNA words, or k-mers, are particularly powerful. We evaluated the utility of varying k-mer lengths for genome comparisons by analyzing their sequence space coverage of 5805 genomes in the KEGG GENOME database. In subsequent analyses on four k-mer lengths spanning the relevant range (11, 21, 31, 41), hierarchical clustering of 1634 genus-level representative genomes using pairwise 21- and 31-mer Jaccard similarities best recapitulated a phylogenetic/taxonomic tree of life with clear boundaries for superkingdom domains and high subtree similarity for named taxons at lower levels (family through phylum). By analyzing ~14.2M prokaryotic genome comparisons by their lowest-common-ancestor taxon levels, we detected many potential misclassification errors in a curated database, further demonstrating the need for wide-scale adoption of quantitative taxonomic classifications based on whole-genome similarity.

Large-scale k-mer-based analysis of the informational properties of genomes, comparative genomics and taxonomy

期刊

PLOS ONE

出版社

PUBLIC LIBRARY SCIENCE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Large-scale k-mer-based analysis of the informational properties of genomes, comparative genomics and taxonomy

期刊

PLOS ONE

出版社

PUBLIC LIBRARY SCIENCE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文