4.6 Article

A convenient correspondence between k-mer-based metagenomic distances and phylogenetically-informed β-diversity measures

期刊

PLOS COMPUTATIONAL BIOLOGY
卷 19, 期 1, 页码 -

出版社

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pcbi.1010821

关键词

-

向作者/读者索取更多资源

K-mer-based distances are often used to describe differences between communities in metagenome sequencing studies. In this paper, we show a strong relationship between k-mer-based distances and phylogenetically-informed beta-diversity measures. Our results allow for phylogenetically-informed analyses using only k-mer data and provide insight into one class of phylogenetically-informed beta-diversity measures.
k-mer-based distances are often used to describe the differences between communities in metagenome sequencing studies because of their computational convenience and history of effectiveness. Although k-mer-based distances do not use information about taxon abundances, we show that one class of k-mer distances between metagenomes (the Euclidean distance between k-mer spectra, or EKS distances) are very closely related to a class of phylogenetically-informed beta-diversity measures that do explicitly use both the taxon abundances and information about the phylogenetic relationships among the taxa. Furthermore, we show that both of these distances can be interpreted as using certain features of the taxon abundances that are related to the phylogenetic tree. Our results allow practitioners to perform phylogenetically-informed analyses when they only have k-mer data available and provide a theoretical basis for using k-mer spectra with relatively small values of k (on the order of 4-5). They are also useful for analysts who wish to know more of the properties of any method based on k-mer spectra and provide insight into one class of phylogenetically-informed beta-diversity measures. Author summaryMicrobiologists have two major strategies for understanding the bacterial communities present in the environment: shotgun metagenome sequencing and amplicon sequencing. Both involve taking samples from the environment, extracting DNA from those samples, and sequencing the extracted DNA. They have different strengths and give different kinds of information about the communities. Because they give different kinds of information, methods for analyzing microbiome data tend to be developed for and used on just one kind of study. In this paper, we show a strong relationship between a set of methods for measuring distances between samples in shotgun metagenome sequencing datasets (the k-mer-based distances) and a set of methods for measuring distances between samples in amplicon sequencing datasets (the phylogenetically-informed beta diversity measures). This is a convenient correspondence because k-mer spectra are easier to extract from shotgun metagenome sequencing datasets than the taxon abundances that would be needed to compute the phylogenetically-informed beta diversities. Therefore, if an analyst would like to compute phylogenetically-informed distances between communities from a shotgun metagenome sequencing dataset, our results show that they can work directly with the k-mer spectra and not worry about estimating taxon abundances. The results also imply that any of the many methods that are based on k-mer spectra are implicitly using phylogenetic information.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据