☆ 4.7 Article

Making the Most of Clumping and Thresholding for Polygenic Scores

AMERICAN JOURNAL OF HUMAN GENETICS (2019)

期刊

AMERICAN JOURNAL OF HUMAN GENETICS

卷 105, 期 6, 页码 1213-1221

出版社

CELL PRESS

DOI: 10.1016/j.ajhg.2019.11.001

关键词

类别

Genetics & Heredity

资金

LabEx PERSYVAL-Lab [ANR-11-LABX-0025-01]
ANR project FROGH [ANR-16-CE12-0033]
French National Research Agency under the Investissements d'avenir program [ANR-15-IDEX-02]
Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH [R248-2017-2003]
Agence Nationale de la Recherche (ANR) [ANR-16-CE12-0033] Funding Source: Agence Nationale de la Recherche (ANR)
MRC [MC_PC_12028] Funding Source: UKRI

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Polygenic prediction has the potential to contribute to precision medicine. Clumping and thresholding (C+T) is a widely used method to derive polygenic scores. When using C+T, several p value thresholds are tested to maximize predictive ability of the derived polygenic scores. Along with this p value threshold, we propose to tune three other hyper-parameters for C+T. We implement an efficient way to derive thousands of different C+T scores corresponding to a grid over four hyper-parameters. For example, it takes a few hours to derive 123K different C+T scores for 300K individuals and 1M variants using 16 physical cores. We find that optimizing over these four hyperparameters improves the predictive performance of C+T in both simulations and real data applications as compared to tuning only the p value threshold. A particularly large increase can be noted when predicting depression status, from an AUC of 0.557 (95% CI: [0.544-0.569]) when tuning only the p value threshold to an AUC of 0.592 (95% CI: [0.580-0.604]) when tuning all four hyper-parameters we propose for C+T. We further propose stacked clumping and thresholding (SCT), a polygenic score that results from stacking all derived C+T scores. Instead of choosing one set of hyper-parameters that maximizes prediction in some training set, SCT learns an optimal linear combination of all C+T scores by using an efficient penalized regression. We apply SCT to eight different case-control diseases in the UK biobank data and find that SCT substantially improves prediction accuracy with an average AUC increase of 0.035 over standard C+T.

Making the Most of Clumping and Thresholding for Polygenic Scores

期刊

AMERICAN JOURNAL OF HUMAN GENETICS

出版社

CELL PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Making the Most of Clumping and Thresholding for Polygenic Scores

期刊

AMERICAN JOURNAL OF HUMAN GENETICS

出版社

CELL PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文