4.7 Article

Predictive Attributes for Developing Long COVID-A Study Using Machine Learning and Real-World Data from Primary Care Physicians in Germany

Journal

JOURNAL OF CLINICAL MEDICINE
Volume 12, Issue 10, Pages -

Publisher

MDPI
DOI: 10.3390/jcm12103511

Keywords

COVID-19; long COVID; machine learning; gradient boosting classifier

Ask authors/readers for more resources

In this study, patient medical history data from primary care practices in Germany were used to predict post-COVID-19 conditions and evaluate associated factors using machine learning. The LGBM classifier was optimized and SHAP values were used to assess feature importance and direction of influence. The results identified several predictive features, including COVID-19 variant, physician practice, age, diagnoses and therapies, sick days ratio, sex, vaccination rate, and various diseases and symptoms, for the development of long COVID. This exploratory study provides initial insights into predicting long COVID risk factors using patient history before COVID-19 infection.
1) In the present study, we used data comprising patient medical histories from a panel of primary care practices in Germany to predict post-COVID-19 conditions in patients after COVID-19 diagnosis and to evaluate the relevant factors associated with these conditions using machine learning methods. (2) Methods: Data retrieved from the IQVIA (TM) Disease Analyzer database were used. Patients with at least one COVID-19 diagnosis between January 2020 and July 2022 were selected for inclusion in the study. Age, sex, and the complete history of diagnoses and prescription data before COVID-19 infection at the respective primary care practice were extracted for each patient. A gradient boosting classifier (LGBM) was deployed. The prepared design matrix was randomly divided into train (80%) and test data (20%). After optimizing the hyperparameters of the LGBM classifier by maximizing the F2 score, model performance was evaluated using several test metrics. We calculated SHAP values to evaluate the importance of the individual features, but more importantly, to evaluate the direction of influence of each feature in our dataset, i.e., whether it is positively or negatively associated with a diagnosis of long COVID. (3) Results: In both the train and test data sets, the model showed a high recall (sensitivity) of 81% and 72% and a high specificity of 80% and 80%; this was offset, however, by a moderate precision of 8% and 7% and an F2-score of 0.28 and 0.25. The most common predictive features identified using SHAP included COVID-19 variant, physician practice, age, distinct number of diagnoses and therapies, sick days ratio, sex, vaccination rate, somatoform disorders, migraine, back pain, asthma, malaise and fatigue, as well as cough preparations. (4) Conclusions: The present exploratory study describes an initial investigation of the prediction of potential features increasing the risk of developing long COVID after COVID-19 infection by using the patient history from electronic medical records before COVID-19 infection in primary care practices in Germany using machine learning. Notably, we identified several predictive features for the development of long COVID in patient demographics and their medical histories.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available