☆ 4.7 Article

Making the Most of Clumping and Thresholding for Polygenic Scores

AMERICAN JOURNAL OF HUMAN GENETICS (2019)

Journal

AMERICAN JOURNAL OF HUMAN GENETICS

Volume 105, Issue 6, Pages 1213-1221

Publisher

CELL PRESS

DOI: 10.1016/j.ajhg.2019.11.001

Keywords

Funding

LabEx PERSYVAL-Lab [ANR-11-LABX-0025-01]
ANR project FROGH [ANR-16-CE12-0033]
French National Research Agency under the Investissements d'avenir program [ANR-15-IDEX-02]
Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH [R248-2017-2003]
Agence Nationale de la Recherche (ANR) [ANR-16-CE12-0033] Funding Source: Agence Nationale de la Recherche (ANR)
MRC [MC_PC_12028] Funding Source: UKRI

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Polygenic prediction has the potential to contribute to precision medicine. Clumping and thresholding (C+T) is a widely used method to derive polygenic scores. When using C+T, several p value thresholds are tested to maximize predictive ability of the derived polygenic scores. Along with this p value threshold, we propose to tune three other hyper-parameters for C+T. We implement an efficient way to derive thousands of different C+T scores corresponding to a grid over four hyper-parameters. For example, it takes a few hours to derive 123K different C+T scores for 300K individuals and 1M variants using 16 physical cores. We find that optimizing over these four hyperparameters improves the predictive performance of C+T in both simulations and real data applications as compared to tuning only the p value threshold. A particularly large increase can be noted when predicting depression status, from an AUC of 0.557 (95% CI: [0.544-0.569]) when tuning only the p value threshold to an AUC of 0.592 (95% CI: [0.580-0.604]) when tuning all four hyper-parameters we propose for C+T. We further propose stacked clumping and thresholding (SCT), a polygenic score that results from stacking all derived C+T scores. Instead of choosing one set of hyper-parameters that maximizes prediction in some training set, SCT learns an optimal linear combination of all C+T scores by using an efficient penalized regression. We apply SCT to eight different case-control diseases in the UK biobank data and find that SCT substantially improves prediction accuracy with an average AUC increase of 0.035 over standard C+T.

Making the Most of Clumping and Thresholding for Polygenic Scores

Journal

AMERICAN JOURNAL OF HUMAN GENETICS

Publisher

CELL PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Making the Most of Clumping and Thresholding for Polygenic Scores

Journal

AMERICAN JOURNAL OF HUMAN GENETICS

Publisher

CELL PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper