Journal
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
Volume 117, Issue 537, Pages 411-427Publisher
TAYLOR & FRANCIS INC
DOI: 10.1080/01621459.2020.1783273
Keywords
Covariates; EM-algorithm; False discovery rate; Multiple testing
Categories
Funding
- NSF [DMS-1811747]
- Mayo Clinic Center for Individualized Medicine
Ask authors/readers for more resources
This article introduces an FDR control procedure that can incorporate covariate information in large-scale inference problems. The proposed procedure is implemented using a fast algorithm and has been shown to have asymptotic validity even in cases of misspecified models and weakly dependent p-values. Extensive simulations demonstrate that the method improves upon existing approaches in terms of flexibility, robustness, power, and computational efficiency. The method is applied to omics datasets from genomics studies to identify features associated with clinical and biological phenotypes, and shows superiority, particularly in sparse signal scenarios.
Conventional multiple testing procedures often assume hypotheses for different features are exchangeable. However, in many scientific applications, additional covariate information regarding the patterns of signals and nulls are available. In this article, we introduce an FDR control procedure in large-scale inference problem that can incorporate covariate information. We develop a fast algorithm to implement the proposed procedure and prove its asymptotic validity even when the underlying likelihood ratio model is misspecified and the p-values are weakly dependent (e.g., strong mixing). Extensive simulations are conducted to study the finite sample performance of the proposed method and we demonstrate that the new approach improves over the state-of-the-art approaches by being flexible, robust, powerful, and computationally efficient. We finally apply the method to several omics datasets arising from genomics studies with the aim to identify omics features associated with some clinical and biological phenotypes. We show that the method is overall the most powerful among competing methods, especially when the signal is sparse. The proposed covariate adaptive multiple testing procedure is implemented in the R package CAMT. Supplementary materials for this article are available online.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available