4.5 Article

Risk prediction of diabetes and pre-diabetes based on physical examination data

Journal

MATHEMATICAL BIOSCIENCES AND ENGINEERING
Volume 19, Issue 4, Pages 3597-3608

Publisher

AMER INST MATHEMATICAL SCIENCES-AIMS
DOI: 10.3934/mbe.2022166

Keywords

diabetes; fasting plasma glucose; physical examination; XGBoost

Funding

  1. National Key R&D Program of China [2020YFC2003403]
  2. Capital's Funds for Health Improvement and Research [2018-2-2242]
  3. National Natural Science Foundation of China [82130112]

Ask authors/readers for more resources

This study collected physical examination data and built classification models to enable early diagnosis of diabetes and identify related risk factors.
Diabetes is a metabolic disorder caused by insufficient insulin secretion and insulin secretion disorders. From health to diabetes, there are generally three stages: health, pre-diabetes and type 2 diabetes. Early diagnosis of diabetes is the most effective way to prevent and control diabetes and its complications. In this work, we collected the physical examination data from Beijing Physical Examination Center from January 2006 to December 2017, and divided the population into three groups according to the WHO (1999) Diabetes Diagnostic Standards: normal fasting plasma glucose (NFG) (FPG < 6.1 mmol/L), mildly impaired fasting plasma glucose (IFG) (6.1 mmol/L <= FPG 7.0 mmol/L) and type 2 diabetes (T2DM) (FPG 7.0 mmol/L). Finally, we obtained1,221,598 NFG samples, 285,965 IFG samples and 387,076 T2DM samples, with a total of 15 physical examination indexes. Furthermore, taking eXtreme Gradient Boosting (XGBoost), random forest (RF), Logistic Regression (LR), and Fully connected neural network (FCN) as classifiers, four models were constructed to distinguish NFG, IFG and T2DM. The comparison results show that XGBoost has the best performance, with AUC (macro) of 0.7874 and AUC (micro) of 0.8633. In addition, based on the XGBoost classifier, three binary classification models were also established to discriminate NFG from IFG, NFG from T2DM, IFG from T2DM. On the independent dataset, the AUCs were 0.7808, 0.8687, 0.7067, respectively. Finally, we analyzed the importance of the features and identified the risk factors associated with diabetes.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available