期刊
GENETIC EPIDEMIOLOGY
卷 43, 期 3, 页码 263-275出版社
WILEY
DOI: 10.1002/gepi.22188
关键词
rank-normalization; rare variants; whole-genome sequencing.
资金
- National Heart, Lung, and Blood Institute [1R35HL135818, 3R01HL-117626-02S1, 3R01HL-120393-02S1, R01 HL092577-06S1, R01HL120393-03S1, T32 HL129982, U01 HL072515, U01 HL137181, U01 HL84756]
- National Human Genome Research Institute [R01HG005827]
- National Institute of Diabetes and Digestive and Kidney Diseases [P30 DK72488]
- University of North Carolina [HHSN268201300001I/N01-HC-65233]
- University of Miami [HHSN268201300004I, N01-HC-65234]
- Albert Einstein College of Medicine [HHSN268201300002I, N01-HC-65235]
- University of Illinois at Chicago [HHSN268201300003I, N01-HC-65236]
- San Diego State University [HHSN268201300005I, N01-HC-65237]
- NHLBI
- NIMHD [HHSN268201300049C, HHSN268201300050C]
- Tougaloo College [HHSN268201300048C]
- University of Mississippi Medical Center [HHSN268201300046C, HHSN268201300047C]
- National Institute of Health [U01 HL072515, U01 HL137181, U01 HL84756, P30 DK72488]
When testing genotype-phenotype associations using linear regression, departure of the trait distribution from normality can impact both Type I error rate control and statistical power, with worse consequences for rarer variants. Because genotypes are expected to have small effects (if any) investigators now routinely use a two-stage method, in which they first regress the trait on covariates, obtain residuals, rank-normalize them, and then use the rank-normalized residuals in association analysis with the genotypes. Potential confounding signals are assumed to be removed at the first stage, so in practice, no further adjustment is done in the second stage. Here, we show that this widely used approach can lead to tests with undesirable statistical properties, due to both combination of a mis-specified mean-variance relationship and remaining covariate associations between the rank-normalized residuals and genotypes. We demonstrate these properties theoretically, and also in applications to genome-wide and whole-genome sequencing association studies. We further propose and evaluate an alternative fully adjusted two-stage approach that adjusts for covariates both when residuals are obtained and in the subsequent association test. This method can reduce excess Type I errors and improve statistical power.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据