4.5 Article

A patient network-based machine learning model for disease prediction: The case of type 2 diabetes mellitus

Journal

APPLIED INTELLIGENCE
Volume 52, Issue 3, Pages 2411-2422

Publisher

SPRINGER
DOI: 10.1007/s10489-021-02533-w

Keywords

Disease prediction; Type 2 Diabetes; Administrative data; Network analysis; Machine learning

Ask authors/readers for more resources

The study utilized real-world healthcare data and developed a model for predicting chronic diseases, specifically type 2 diabetes, by combining patient networks and machine learning methods to accurately assess the risk of developing such conditions.
In recent years, the prevalence of chronic diseases such as type 2 diabetes mellitus (T2DM) has increased, bringing a heavy burden to healthcare systems. While regular monitoring of patients is expensive and impractical, understanding chronic disease progressions and identifying patients at risk of developing comorbidities are crucial. This research used a real-world administrative claim dataset of T2DM to develop an ensemble of innovative patient network and machine learning approach for disease prediction. The healthcare data of 1,028 T2DM patients and 1,028 non-T2DM patients are extracted from the de-identified data to predict the risk of T2DM. The proposed model is based on the 'patient network', which represents the underlying relationships among health conditions for a group of patients diagnosed with the same disease using the graph theory. Besides patients' socio-demographic and behaviour characteristics, the attributes of the 'patient network' (e.g., centrality measure) discover patients' latent features, which are effective in risk prediction. We apply eight machine learning models (Logistic Regression, K-Nearest Neighbours, Support Vector Machine, Naive Bayes, Decision Tree, Random Forest, XGBoost and Artificial Neural Network) to the extracted features to predict the chronic disease risk. The extensive experiments show that the proposed framework with machine learning classifiers performance with the Area Under Curve (AUC) ranged from 0.79 to 0.91. The Random Forest model outperformed the other models; whereas, eigenvector centrality and closeness centrality of the network and patient age are the most important features for the model. The outstanding performance of our model provides promising potential applications in healthcare services. Also, we provide strong evidence that the extracted latent features are essential in the disease risk prediction. The proposed approach offers vital insight into chronic disease risk prediction that could benefit healthcare service providers and their stakeholders.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available