4.0 Article

Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes ELSA-Brasil: accuracy study

期刊

SAO PAULO MEDICAL JOURNAL
卷 135, 期 3, 页码 234-246

出版社

ASSOCIACAO PAULISTA MEDICINA
DOI: 10.1590/1516-3180.2016.0309010217

关键词

Supervised machine learning; Decision support techniques; Data mining; Models, statistical; Diabetes mellitus, tipe 2

资金

  1. Brazilian Ministry of Health (Science and Technology Department)
  2. Brazilian Ministry of Science and Technology (Study and Project Financing Sector) [01 06 0010.00 RS, 01 06 0212.00 BA, 01 06 0300.00 ES, 01 06 0278.00 MG, 01 06 0115.00 SP, 01 06 0071.00 RJ, 478518_2013-7]
  3. Brazilian Ministry of Science and Technology (CNPq National Research Council) [01 06 0010.00 RS, 01 06 0212.00 BA, 01 06 0300.00 ES, 01 06 0278.00 MG, 01 06 0115.00 SP, 01 06 0071.00 RJ, 478518_2013-7]
  4. CAPES (Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior) [AUXPE PROEX 2587/2012]

向作者/读者索取更多资源

CONTEXT AND OBJECTIVE: Type 2 diabetes is a chronic disease associated with a wide range of serious health complications that have a major impact on overall health. The aims here were to develop and validate predictive models for detecting undiagnosed diabetes using data from the Longitudinal Study of Adult Health (ELSA-Brasil) and to compare the performance of different machine-learning algorithms in this task. DESIGN AND SETTING: Comparison of machine-learning algorithms to develop predictive models using data from ELSA-Brasil. METHODS: After selecting a subset of 27 candidate variables from the literature, models were built and validated in four sequential steps: (i) parameter tuning with tenfold cross-validation, repeated three times; (ii) automatic variable selection using forward selection, a wrapper strategy with four different machine-learning learning algorithms and tenfold cross-validation (repeated three times), to evaluate each subset of variables; (iii) error estimation of model parameters with tenfold cross-validation, repeated ten times; and (iv) generalization testing on an independent dataset. The models were created with the following machine-learning algorithms: logistic regression, artificial neural network, naive Bayes, K-nearest neighbor and random forest. RESULTS: The best models were created using artificial neural networks and logistic regression. These achieved mean areas under the curve of, respectively, 75.24% and 74.98% in the error estimation step and 74.17% and 74.41% in the generalization testing step. CONCLUSION: Most of the predictive models produced similar results, and demonstrated the feasibility of identifying individuals with highest probability of having undiagnosed diabetes, through easily-obtained clinical data.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.0
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据