4.7 Article

Machine learning prediction of incidence of Alzheimer's disease using large-scale administrative health data

期刊

NPJ DIGITAL MEDICINE
卷 3, 期 1, 页码 -

出版社

NATURE RESEARCH
DOI: 10.1038/s41746-020-0256-0

关键词

-

资金

  1. Seoul National University
  2. NHIS Ilsan Hospital Research Support Program
  3. National Institute of Mental Health [K01-MH109836]
  4. Brain Behavior Research Foundation Young Investigator Award
  5. Korean Scientists and Engineers Association Young Investigator Grant
  6. Brain Pool Program through the National Research Foundation of Korea (NRF) - Ministry of Science and ICT [200-20190251]

向作者/读者索取更多资源

Nationwide population-based cohort provides a new opportunity to build an automated risk prediction model based on individuals' history of health and healthcare beyond existing risk prediction models. We tested the possibility of machine learning models to predict future incidence of Alzheimer's disease (AD) using large-scale administrative health data. From the Korean National Health Insurance Service database between 2002 and 2010, we obtained de-identified health data in elders above 65 years (N = 40,736) containing 4,894 unique clinical features including ICD-10 codes, medication codes, laboratory values, history of personal and family illness and socio-demographics. To define incident AD we considered two operational definitions: definite AD with diagnostic codes and dementia medication (n = 614) and probable AD with only diagnosis (n = 2026). We trained and validated random forest, support vector machine and logistic regression to predict incident AD in 1, 2, 3, and 4 subsequent years. For predicting future incidence of AD in balanced samples (bootstrapping), the machine learning models showed reasonable performance in 1-year prediction with AUC of 0.775 and 0.759, based on definite AD and probable AD outcomes, respectively; in 2-year, 0.730 and 0.693; in 3-year, 0.677 and 0.644; in 4-year, 0.725 and 0.683. The results were similar when the entire (unbalanced) samples were used. Important clinical features selected in logistic regression included hemoglobin level, age and urine protein level. This study may shed a light on the utility of the data-driven machine learning model based on large-scale administrative health data in AD risk prediction, which may enable better selection of individuals at risk for AD in clinical trials or early detection in clinical settings.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据