期刊
AMERICAN JOURNAL OF HUMAN GENETICS
卷 105, 期 6, 页码 1182-1192出版社
CELL PRESS
DOI: 10.1016/j.ajhg.2019.10.008
关键词
-
资金
- National Institutes of Health [R01 HG008773]
- [45227]
The etiology of most complex diseases involves genetic variants, environmental factors, and gene-environment interaction (G x F.) effects. Compared with marginal genetic association studies, G x F. analysis requires more samples and detailed measure of environmental exposures, and this limits the possible discoveries. Large-scale population-based biobanks with detailed phenotypic and environmental information, such as UK-Biobank, can be ideal resources for identifying G x F. effects. However, due to the large computation cost and the presence of case-control imbalance, existing methods often fail. Here we propose a scalable and accurate method, SPAGE (SaddlePoint Approximation implementation of G x F. analysis), that is applicable for genome-wide scale phenome-wide G x F. studies. SPAGE fits a genotype-independent logistic model only once across the genome-wide analysis in order to reduce computation cost, and SPAGE uses a saddlepoint approximation (SPA) to calibrate the test statistics for analysis of phenotypes with unbalanced case-control ratios. Simulation studies show that SPAGE is 33-79 times faster than the Wald test and 72-439 times faster than the Firth's test, and SPAGE can control type I error rates at the genome-wide significance level even when case-control ratios are extremely unbalanced. Through the analysis of UK-Biobank data of 344,341 white British European-ancestry samples, we show that SPAGE can efficiently analyze large samples while controlling for unbalanced case-control ratios.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据