4.8 Article

Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies

Journal

NATURE GENETICS
Volume 50, Issue 9, Pages 1335-+

Publisher

NATURE PUBLISHING GROUP
DOI: 10.1038/s41588-018-0184-y

Keywords

-

Funding

  1. NIH [R01 HG008773, R35 HL135824, R01 LM010685, U2C OD023196]
  2. University of Michigan Rackham Predoctoral Fellowship
  3. Danish Heart Foundation
  4. Lundbeck Foundation
  5. EUNICE KENNEDY SHRIVER NATIONAL INSTITUTE OF CHILD HEALTH & HUMAN DEVELOPMENT [U54HD083211] Funding Source: NIH RePORTER
  6. NATIONAL HEART, LUNG, AND BLOOD INSTITUTE [R01HL133786, R35HL135824] Funding Source: NIH RePORTER
  7. NATIONAL HUMAN GENOME RESEARCH INSTITUTE [T32HG000040, R01HG008773] Funding Source: NIH RePORTER
  8. NATIONAL LIBRARY OF MEDICINE [R01LM010685] Funding Source: NIH RePORTER
  9. OFFICE OF THE DIRECTOR, NATIONAL INSTITUTES OF HEALTH [U2COD023196] Funding Source: NIH RePORTER

Ask authors/readers for more resources

In genome-wide association studies (GWAS) for thousands of phenotypes in large biobanks, most binary traits have substantially fewer cases than controls. Both of the widely used approaches, the linear mixed model and the recently proposed logistic mixed model, perform poorly; they produce large type I error rates when used to analyze unbalanced case-control phenotypes. Here we propose a scalable and accurate generalized mixed model association test that uses the saddlepoint approximation to calibrate the distribution of score test statistics. This method, SAIGE (Scalable and Accurate Implementation of GEneralized mixed model), provides accurate P values even when case-control ratios are extremely unbalanced. SAIGE uses state-of-art optimization strategies to reduce computational costs; hence, it is applicable to GWAS for thousands of phenotypes by large biobanks. Through the analysis of UK Biobank data of 408,961 samples from white British participants with European ancestry for > 1,400 binary phenotypes, we show that SAIGE can efficiently analyze large sample data, controlling for unbalanced case-control ratios and sample relatedness.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available