☆ 4.4 Article

BAYESIAN LOCAL FALSE DISCOVERY RATE FOR SPARSE COUNT DATA WITH APPLICATION TO THE DISCOVERY OF HOTSPOTS IN PROTEIN DOMAINS

ANNALS OF APPLIED STATISTICS (2022)

期刊

ANNALS OF APPLIED STATISTICS

卷 16, 期 3, 页码 1459-1475

出版社

INST MATHEMATICAL STATISTICS-IMS

DOI: 10.1214/21-AOAS1551

关键词

Bayesian local false discovery rate; sparse count data; zero-inflated generalized Poisson; protein domains

类别

Statistics & Probability

资金

National Research Foundation of Korea - Korea government (MSIT) [NRF-2019H1D3A2A02102167, NRF2020R1A2C1A01100526]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In cancer research at the molecular level, understanding the role of somatic mutations in the initiation or progression of cancer is crucial. Recently, studying cancer somatic variants at the protein domain level has become important for uncovering functionally related mutations. The main challenge is to identify protein domain hotspots with significantly high mutation frequency.

In cancer research at the molecular level, it is critical to understand which somatic mutations play an important role in the initiation or progression of cancer. Recently, studying cancer somatic variants at the protein domain level is an important area for uncovering functionally related somatic mutations. The main issue is to find the protein domain hotspots which have significantly high frequency of mutations. Multiple testing procedures are commonly used to identify hotspots; however, when data is not large enough, existing methods produce unreliable results with failure in controlling a given type I error rate. We propose multiple testing procedures, based on Bayesian local false discovery rate, for sparse count data and apply it in the identification of clusters of somatic mutations across entire gene families using protein domain models. In multiple testing for count data, it is not clear what kind of the null distribution should be admitted. In our proposed algorithms, we implement the zero assumption in the context of Bayesian methods to identify the null distribution for count data rather than using any theoretical null distribution. Furthermore, we also address different types of modeling of alternative distributions. The proposed fully Bayesian models are efficient when the number of count data is small (50 <= N < 200) while the local false discovery rate procedures, based on the empirical Bayes, is desirable for a large number of data ( N > 800). We provide numerical studies to show that the proposed fully Bayesian methods can control a given level of false discovery rate for small number of positions while existing approaches based on nonparametric empirical Bayes fail in controlling a false discovery rate. In addition, we present real data examples of protein domain data to select hotspots in protein domain data.

BAYESIAN LOCAL FALSE DISCOVERY RATE FOR SPARSE COUNT DATA WITH APPLICATION TO THE DISCOVERY OF HOTSPOTS IN PROTEIN DOMAINS

期刊

ANNALS OF APPLIED STATISTICS

出版社

INST MATHEMATICAL STATISTICS-IMS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

BAYESIAN LOCAL FALSE DISCOVERY RATE FOR SPARSE COUNT DATA WITH APPLICATION TO THE DISCOVERY OF HOTSPOTS IN PROTEIN DOMAINS

期刊

ANNALS OF APPLIED STATISTICS

出版社

INST MATHEMATICAL STATISTICS-IMS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文