4.7 Article

Making the Most of Clumping and Thresholding for Polygenic Scores

期刊

AMERICAN JOURNAL OF HUMAN GENETICS
卷 105, 期 6, 页码 1213-1221

出版社

CELL PRESS
DOI: 10.1016/j.ajhg.2019.11.001

关键词

-

资金

  1. LabEx PERSYVAL-Lab [ANR-11-LABX-0025-01]
  2. ANR project FROGH [ANR-16-CE12-0033]
  3. French National Research Agency under the Investissements d'avenir program [ANR-15-IDEX-02]
  4. Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH [R248-2017-2003]
  5. Agence Nationale de la Recherche (ANR) [ANR-16-CE12-0033] Funding Source: Agence Nationale de la Recherche (ANR)
  6. MRC [MC_PC_12028] Funding Source: UKRI

向作者/读者索取更多资源

Polygenic prediction has the potential to contribute to precision medicine. Clumping and thresholding (C+T) is a widely used method to derive polygenic scores. When using C+T, several p value thresholds are tested to maximize predictive ability of the derived polygenic scores. Along with this p value threshold, we propose to tune three other hyper-parameters for C+T. We implement an efficient way to derive thousands of different C+T scores corresponding to a grid over four hyper-parameters. For example, it takes a few hours to derive 123K different C+T scores for 300K individuals and 1M variants using 16 physical cores. We find that optimizing over these four hyperparameters improves the predictive performance of C+T in both simulations and real data applications as compared to tuning only the p value threshold. A particularly large increase can be noted when predicting depression status, from an AUC of 0.557 (95% CI: [0.544-0.569]) when tuning only the p value threshold to an AUC of 0.592 (95% CI: [0.580-0.604]) when tuning all four hyper-parameters we propose for C+T. We further propose stacked clumping and thresholding (SCT), a polygenic score that results from stacking all derived C+T scores. Instead of choosing one set of hyper-parameters that maximizes prediction in some training set, SCT learns an optimal linear combination of all C+T scores by using an efficient penalized regression. We apply SCT to eight different case-control diseases in the UK biobank data and find that SCT substantially improves prediction accuracy with an average AUC increase of 0.035 over standard C+T.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据