4.6 Article

A machine learning approach to identifying patients with pulmonary hypertension using real-world electronic health records

Journal

INTERNATIONAL JOURNAL OF CARDIOLOGY
Volume 374, Issue -, Pages 95-99

Publisher

ELSEVIER IRELAND LTD
DOI: 10.1016/j.ijcard.2022.12.016

Keywords

Artificial intelligence; Machine learning; Pulmonary hypertension; Diagnostic delay; Early diagnosis; Electronic health record

Ask authors/readers for more resources

This study developed a machine learning model based on a US-based electronic health record database to identify patients with pulmonary hypertension (PH). The model used diagnostic, treatment, and procedure codes to identify PH and control patients, and achieved an AUROC of 0.92. The model showed good performance in subgroups of patients with different types of PH.
Background: This study aimed to develop a machine learning (ML) model to identify patients who are likely to have pulmonary hypertension (PH), using a large patient-level US-based electronic health record (EHR) database.Methods: A gradient boosting model, XGBoost, was developed using data from Optum's US-based de-identified EHR dataset (2007-2019). PH and disease control adult patients were identified using diagnostic, treatment and procedure codes and were randomly split into the training (90%) or test set (10%). Model features included patient demographics, physician visits, diagnoses, procedures, prescriptions, and laboratory test results. SHapley Additive exPlanations values were used to determine feature importance. Results: We identified 11,279,478 control and 115,822 PH patients (mean age, respectively: 62 and 68 years, both 53% female). The final model used 165 features, with the most important predictive features including diagnosis of heart failure, shortness of breath and atrial fibrillation. The model predicted PH with an area under the receiver operating characteristic curve (AUROC) of 0.92. AUROC remained above 0.80 for the prediction of PH up to and beyond 18 months before diagnosis. Among the PH patients, we also identified 955 pulmonary arterial hypertension (PAH) and 1432 chronic thromboembolic pulmonary hypertension (CTEPH) patients, and the range of AUROCs obtained for these cohorts was 0.79-0.90 and 0.87-0.96, respectively.Conclusions: This model to detect PH based on patients' EHR records is viable and performs well in subgroups of PAH and CTEPH patients. This approach has the potential to improve patient outcomes by reducing diagnostic delay in PH.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available