3.8 Article

Compositional Uncertainty Should Not Be Ignored in High-Throughput Sequencing Data Analysis

期刊

AUSTRIAN JOURNAL OF STATISTICS
卷 45, 期 4, 页码 73-87

出版社

AUSTRIAN STATISTICAL SOC
DOI: 10.17713/ajs.v45i4.122

关键词

Bayesian estimation; centred log-ratio; transcriptome; metagenome; 16S rRNA gene sequencing; ALDEx2; R

资金

  1. National Science and Engineering Research Council of Canada

向作者/读者索取更多资源

High throughput sequencing generates sparse compositional data, yes these datasets are rarely analyzed using a compositional approach. In addition, the variation inherent in these datasets is rarely acknowledged, but ignoring it can result in many false positive inferences. We demonstrate that examination of point estimates of the data can result in false positive results, even with appropriate zero replacement approaches, using an in vitro selection dataset with an outside standard of truth. The variation inherent in real high-throughput sequencing datasets is demonstrated, and we show that this variation can be approximated, and hence accounted for, by Monte-Carlo sampling from the Dirichlet distribution. This approximation when used by itself is itself problematic, but becomes useful when coupled with a log-ratio approach commonly used in compositional data analysis. Thus, the approach illustrated here that merges Bayesian estimation with principles of compositional data analysis should be generally useful for high-dimensional count compositional data of the type generated by high throughput sequencing.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据