4.6 Article

Machine Learning Models for Predicting Adverse Pregnancy Outcomes in Pregnant Women with Systemic Lupus Erythematosus

Journal

DIAGNOSTICS
Volume 13, Issue 4, Pages -

Publisher

MDPI
DOI: 10.3390/diagnostics13040612

Keywords

prediction; machine learning; systemic lupus erythematosus; SLE; pregnancy; gestation; random forest

Ask authors/readers for more resources

This study aimed to develop predictive models using machine learning techniques to explore more information from medical records of pregnant women with SLE for predicting adverse outcomes. After analysis and selection, 18 variables showed statistical differences and 40 variables were identified as contributing predictors. The Random Forest algorithm demonstrated the best discrimination ability for overall predictive models and achieved the best performance in real-time predictive accuracy assessment. Machine learning models could overcome the limitations of statistical methods in the presence of small sample sizes and numerous variables, and the RF classifier performed well in such structured medical records.
Predicting adverse outcomes is essential for pregnant women with systemic lupus erythematosus (SLE) to minimize risks. Applying statistical analysis may be limited for the small sample size of childbearing patients, while the informative medical records could be provided. This study aimed to develop predictive models applying machine learning (ML) techniques to explore more information. We performed a retrospective analysis of 51 pregnant women exhibiting SLE, including 288 variables. After correlation analysis and feature selection, six ML models were applied to the filtered dataset. The efficiency of these overall models was evaluated by the Receiver Operating Characteristic Curve. Meanwhile, real-time models with different timespans based on gestation were also explored. Eighteen variables demonstrated statistical differences between the two groups; more than forty variables were screened out by ML variable selection strategies as contributing predictors, while the overlap of variables were the influential indicators testified by the two selection strategies. The Random Forest (RF) algorithm demonstrated the best discrimination ability under the current dataset for overall predictive models regardless of the data missing rate, while Multi-Layer Perceptron models ranked second. Meanwhile, RF achieved best performance when assessing the real-time predictive accuracy of models. ML models could compensate the limitation of statistical methods when the small sample size problem happens along with numerous variables acquired, while RF classifier performed relatively best when applied to such structured medical records.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available