4.3 Article

Polygenic risk scores outperform machine learning methods in predicting coronary artery disease status

期刊

GENETIC EPIDEMIOLOGY
卷 44, 期 2, 页码 125-138

出版社

WILEY
DOI: 10.1002/gepi.22279

关键词

classification; coronary artery disease; machine learning; polygenic risk scores; prediction

资金

  1. Deutsche Forschungsgemeinschaft [KO2250/7]
  2. Bundesministerium fur Bildung und Forschung [81Z1700103]

向作者/读者索取更多资源

Coronary artery disease (CAD) is the leading global cause of mortality and has substantial heritability with a polygenic architecture. Recent approaches of risk prediction were based on polygenic risk scores (PRS) not taking possible nonlinear effects into account and restricted in that they focused on genetic loci associated with CAD, only. We benchmarked PRS, (penalized) logistic regression, naive Bayes (NB), random forests (RF), support vector machines (SVM), and gradient boosting (GB) on a data set of 7,736 CAD cases and 6,774 controls from Germany to identify the algorithms for most accurate classification of CAD status. The final models were tested on an independent data set from Germany (527 CAD cases and 473 controls). We found PRS to be the best algorithm, yielding an area under the receiver operating curve (AUC) of 0.92 (95% CI [0.90, 0.95], 50,633 loci) in the German test data. NB and SVM (AUC similar to 0.81) performed better than RF and GB (AUC similar to 0.75). We conclude that using PRS to predict CAD is superior to machine learning methods.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据