4.6 Article

A gradient boosting-based mortality prediction model for COVID-19 patients

Journal

NEURAL COMPUTING & APPLICATIONS
Volume -, Issue -, Pages -

Publisher

SPRINGER LONDON LTD
DOI: 10.1007/s00521-023-08997

Keywords

COVID-19; Machine learning; Gradient-based boosting machines; SMOTE; Random under-sampling; Clustering-based under-sampling

Ask authors/readers for more resources

The aim of this study is to propose a gradient boosting-based model to predict the mortality of COVID-19 patients and to improve prediction accuracy through incorporating resampling strategies. Real COVID-19 data including patients' travel, health, geographical, and demographic information is used, and class imbalance problem in the dataset is solved using techniques like synthetic minority oversampling technique (SMOTE), random under-sampling, and clustering-based under-sampling. The experimental results reveal the influence of factors like age, Wuhan origin, and time difference between symptom onset and hospital visit on COVID-19 patient mortality, and compare the performance of XGBoost, LightGBM, and CatBoost algorithms. The study emphasizes the importance of addressing class imbalance problem and using resampling strategies to improve prediction accuracy for COVID-19 mortality.
The COVID-19 pandemic has been a global public health concern since March 11, 2020. Healthcare systems struggled to meet patients' growing needs for diagnosis, treatment, and care. As healthcare industries struggled to cope with the overwhelming demands, advanced intelligence and computing technologies have become essential. Artificial intelligence techniques have become essential for identifying and triaging patients, predicting disease severity, and detecting outcomes. The aim of the paper is to propose a gradient boosting-based model to predict the mortality of COVID-19 patients and to improve the prediction accuracy by incorporating resampling strategies. A real COVID-19 data that includes patients' travel, health, geographical, and demographic information is obtained from a public repository. The dataset used in the study has the class imbalance problem, and several approaches are applied to solve the problem. In this study, a gradient boosting-based model for predicting the mortality of COVID-19 patients is proposed. This approach incorporates resampling strategies, such as synthetic minority oversampling technique (SMOTE), random under-sampling, and clustering-based under-sampling, to address the imbalanced class distribution problem in the dataset. Then, gradient boosting machines (GBM) such as extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and categorical boosting (CatBoost) are analyzed in terms of accuracy and computational time. Random search method is used to find the optimal hyper-parameters for the algorithms. A stacking-based hybrid model that combines the XGBoost, LightGBM, and CatBoost algorithms was used for comparison in the experiments. In the experiments, the factors that can influence the mortality of COVID-19 patients are investigated. And, it is found that the age of the patient, whether the patient belonged to Wuhan, the difference between when they first noticed symptoms and when they visited the hospital (in days) affect the mortality. By utilizing over/under-sampling approaches, we ameliorated the concern of class imbalance. XGBoost, LightGBM, and CatBoost are effectively analyzed in terms of various performance metrics to determine the suitable GBM for the proposed system. The experimental results revealed that the stacking-based hybrid model performs well with the balanced dataset provided by SMOTE. CatBoost produces superior results for a balanced dataset with random under-sampling and clustering-based under-sampling. The main focus of the study is to propose a gradient boosting-based model for predicting the mortality of COVID-19 patients. This study also emphasizes the importance of addressing the imbalanced class distribution problem in the dataset and incorporates resampling strategies to improve the prediction accuracy. Our promising result confirms the success of the proposed system in predicting mortality of COVID-19 disease.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available