4.7 Article

Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints

Journal

JOURNAL OF MACHINE LEARNING RESEARCH
Volume 22, Issue -, Pages -

Publisher

MICROTOME PUBL

Keywords

Debiasing; Distributed learning; False discovery rate; High dimensional inference; Integrative analysis; Multiple testing

Funding

  1. NSFC [12022103, 11771094, 11690013]
  2. Translational Data Science Center for a Learning Health System at Harvard Medical School
  3. Harvard T.H. Chan School of Public Health
  4. [MVP000]
  5. [MVP001]

Ask authors/readers for more resources

Identifying informative predictors in a high-dimensional regression model is crucial for association analysis and predictive modeling. Signal detection often fails in high-dimensional settings due to limited sample size, but meta-analyzing multiple studies can help improve power. Integrative analysis of high-dimensional data from different studies poses challenges, especially with data sharing constraints, but a new method called DSILT is proposed for signal detection without sharing individual-level data. The method incorporates proper estimation and debiasing procedures to construct test statistics for specific covariates, and a multiple testing procedure is developed to control false discovery rate and identify significant effects. Simulation studies show the proposed testing procedure performs well in controlling false discoveries and achieving power.
Identifying informative predictors in a high dimensional regression model is a critical step for association analysis and predictive modeling. Signal detection in the high dimensional setting often fails due to the limited sample size. One approach to improving power is through meta-analyzing multiple studies which address the same scientific question. However, integrative analysis of high dimensional data from multiple studies is challenging in the presence of between-study heterogeneity. The challenge is even more pronounced with additional data sharing constraints under which only summary data can be shared across different sites. In this paper, we propose a novel data shielding integrative large-scale testing (DSILT) approach to signal detection allowing between-study heterogeneity and not requiring the sharing of individual level data. Assuming the underlying high dimensional regression models of the data differ across studies yet share similar support, the proposed method incorporates proper integrative estimation and debiasing procedures to construct test statistics for the overall effects of specific covariates. We also develop a multiple testing procedure to identify significant effects while controlling the false discovery rate (FDR) and false discovery proportion (FDP). Theoretical comparisons of the new testing procedure with the ideal individual-level meta-analysis (ILMA) approach and other distributed inference methods are investigated. Simulation studies demonstrate that the proposed testing procedure performs well in both controlling false discovery and attaining power. The new method is applied to a real example detecting interaction effects of the genetic variants for statins and obesity on the risk for type II diabetes.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available