4.7 Article

Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints

期刊

出版社

MICROTOME PUBL

关键词

Debiasing; Distributed learning; False discovery rate; High dimensional inference; Integrative analysis; Multiple testing

资金

  1. NSFC [12022103, 11771094, 11690013]
  2. Translational Data Science Center for a Learning Health System at Harvard Medical School
  3. Harvard T.H. Chan School of Public Health
  4. [MVP000]
  5. [MVP001]

向作者/读者索取更多资源

Identifying informative predictors in a high-dimensional regression model is crucial for association analysis and predictive modeling. Signal detection often fails in high-dimensional settings due to limited sample size, but meta-analyzing multiple studies can help improve power. Integrative analysis of high-dimensional data from different studies poses challenges, especially with data sharing constraints, but a new method called DSILT is proposed for signal detection without sharing individual-level data. The method incorporates proper estimation and debiasing procedures to construct test statistics for specific covariates, and a multiple testing procedure is developed to control false discovery rate and identify significant effects. Simulation studies show the proposed testing procedure performs well in controlling false discoveries and achieving power.
Identifying informative predictors in a high dimensional regression model is a critical step for association analysis and predictive modeling. Signal detection in the high dimensional setting often fails due to the limited sample size. One approach to improving power is through meta-analyzing multiple studies which address the same scientific question. However, integrative analysis of high dimensional data from multiple studies is challenging in the presence of between-study heterogeneity. The challenge is even more pronounced with additional data sharing constraints under which only summary data can be shared across different sites. In this paper, we propose a novel data shielding integrative large-scale testing (DSILT) approach to signal detection allowing between-study heterogeneity and not requiring the sharing of individual level data. Assuming the underlying high dimensional regression models of the data differ across studies yet share similar support, the proposed method incorporates proper integrative estimation and debiasing procedures to construct test statistics for the overall effects of specific covariates. We also develop a multiple testing procedure to identify significant effects while controlling the false discovery rate (FDR) and false discovery proportion (FDP). Theoretical comparisons of the new testing procedure with the ideal individual-level meta-analysis (ILMA) approach and other distributed inference methods are investigated. Simulation studies demonstrate that the proposed testing procedure performs well in both controlling false discovery and attaining power. The new method is applied to a real example detecting interaction effects of the genetic variants for statins and obesity on the risk for type II diabetes.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据