期刊
AUSTRIAN JOURNAL OF STATISTICS
卷 45, 期 4, 页码 73-87出版社
AUSTRIAN STATISTICAL SOC
DOI: 10.17713/ajs.v45i4.122
关键词
Bayesian estimation; centred log-ratio; transcriptome; metagenome; 16S rRNA gene sequencing; ALDEx2; R
资金
- National Science and Engineering Research Council of Canada
High throughput sequencing generates sparse compositional data, yes these datasets are rarely analyzed using a compositional approach. In addition, the variation inherent in these datasets is rarely acknowledged, but ignoring it can result in many false positive inferences. We demonstrate that examination of point estimates of the data can result in false positive results, even with appropriate zero replacement approaches, using an in vitro selection dataset with an outside standard of truth. The variation inherent in real high-throughput sequencing datasets is demonstrated, and we show that this variation can be approximated, and hence accounted for, by Monte-Carlo sampling from the Dirichlet distribution. This approximation when used by itself is itself problematic, but becomes useful when coupled with a log-ratio approach commonly used in compositional data analysis. Thus, the approach illustrated here that merges Bayesian estimation with principles of compositional data analysis should be generally useful for high-dimensional count compositional data of the type generated by high throughput sequencing.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据