4.7 Article

Estimating pseudocounts and fold changes for digital expression measurements

期刊

BIOINFORMATICS
卷 34, 期 23, 页码 4054-4063

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/bty471

关键词

-

资金

  1. Helmholtz Institute for RNA-based Infection Research (HIRI) through Bavarian Ministry of Economic Affairs and Media, Energy and Technology [0703/68674/5/2017, 0703/89374/3/2017]

向作者/读者索取更多资源

Motivation: Fold changes from count based high-throughput experiments such as RNA-seq suffer from a zero-frequency problem. To circumvent division by zero, so-called pseudocounts are added to make all observed counts strictly positive. The magnitude of pseudocounts for digital expression measurements and on which stage of the analysis they are introduced remained an arbitrary choice. Moreover, in the strict sense, fold changes are not quantities that can be computed. Instead, due to the stochasticity involved in the experiments, they must be estimated by statistical inference. Results: Here, we build on a statistical framework for fold changes, where pseudocounts correspond to the parameters of the prior distribution used for Bayesian inference of the fold change. We show that arbitrary and widely used choices for applying pseudocounts can lead to biased results. As a statistical rigorous alternative, we propose and test an empirical Bayes procedure to choose appropriate pseudocounts. Moreover, we introduce the novel estimator Psi LFC for fold changes showing favorable properties with small counts and smaller deviations from the truth in simulations and real data compared to existing methods. Our results have direct implications for entities with few reads in sequencing experiments, and indirectly also affect results for entities with many reads.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据