4.6 Article

Combining the strengths of inverse-variance weighting and Egger regression in Mendelian randomization using a mixture of regressions model

Journal

PLOS GENETICS
Volume 17, Issue 11, Pages -

Publisher

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pgen.1009922

Keywords

-

Funding

  1. NIH [R01 AG069895, R01AG065636, RF1 AG067924, R01 AG065636, R01 GM126002, R01 GM113250, R01HL116720]
  2. NSF [DMS1711226]

Ask authors/readers for more resources

The study presents a mixture model combining IVW and Egger regression to improve statistical power and robustness, considering both valid and invalid IVs. Model averaging and data perturbation schemes are proposed to address uncertainties in model/IV selection for more robust statistical inference in finite samples. Extensive simulations and applications demonstrate the superiority of the proposed methods over IVW and Egger regression, highlighting their potential utility in causal inference.
With the increasing availability of large-scale GWAS summary data on various traits, Mendelian randomization (MR) has become commonly used to infer causality between a pair of traits, an exposure and an outcome. It depends on using genetic variants, typically SNPs, as instrumental variables (IVs). The inverse-variance weighted (IVW) method (with a fixed-effect meta-analysis model) is most powerful when all IVs are valid; however, when horizontal pleiotropy is present, it may lead to biased inference. On the other hand, Egger regression is one of the most widely used methods robust to (uncorrelated) pleiotropy, but it suffers from loss of power. We propose a two-component mixture of regressions to combine and thus take advantage of both IVW and Egger regression; it is often both more efficient (i.e. higher powered) and more robust to pleiotropy (i.e. controlling type I error) than either IVW or Egger regression alone by accounting for both valid and invalid IVs respectively. We propose a model averaging approach and a novel data perturbation scheme to account for uncertainties in model/IV selection, leading to more robust statistical inference for finite samples. Through extensive simulations and applications to the GWAS summary data of 48 risk factor-disease pairs and 63 genetically uncorrelated trait pairs, we showcase that our proposed methods could often control type I error better while achieving much higher power than IVW and Egger regression (and sometimes than several other new/popular MR methods). We expect that our proposed methods will be a useful addition to the toolbox of Mendelian randomization for causal inference. Author summaryFor causal inference, inverse-variance weighting (IVW) and Egger regression are two of the most widely applied Mendelian randomization methods nowadays. IVW is the most powerful under the perhaps too restrictive assumption that all IVs are valid, while Egger regression is often unnecessarily too flexible in assuming all IVs to be invalid with uncorrelated pleiotropic effects. In spite of their usefulness, we point out their limitations: an IVW estimate of a causal effect would be biased if some/all IVs have directional pleiotropic effects, and an Egger regression estimate has too large a variance, leading to its loss of power. Accordingly we propose a mixture model to combine them to take advantage of their strengths while overcoming their major limitations. Furthermore, we propose a model-averaging approach and a novel data perturbation scheme to account for uncertainties in model/IV selection, leading to more robust statistical inference. Through simulations and applications to some publicly available large-scale GWAS summary data, we demonstrate the superiority of our methods over IVW and Egger regression (and over some other state-of-the-art MR methods in some scenarios).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available