4.8 Article

Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth

Journal

BMC MEDICINE
Volume 20, Issue 1, Pages -

Publisher

BMC
DOI: 10.1186/s12916-022-02522-x

Keywords

Preterm birth; Machine learning; Electronic health records; Artificial intelligence

Funding

  1. American Heart Association fellowship [20PRE35080073]
  2. National Institutes of Health (NIH) [T32GM007347, NLM K01LM012381, R35GM127087, 1R01HD101669, S10RR025141, UL1TR002243, UL1TR000445, UL1RR024975]
  3. March of Dimes
  4. Burroughs Wellcome Fund
  5. [U01HG004798]
  6. [R01NS032830]
  7. [RC2GM092618]
  8. [P50GM115305]
  9. [U01HG006378]
  10. [U19HL065962]
  11. [R01HD074711]

Ask authors/readers for more resources

Machine learning models based on billing codes from electronic health records can accurately predict singleton preterm birth risk and outperform models trained on known risk factors. These models also stratify deliveries into interpretable groups and predict preterm birth subtypes, mode of delivery, and recurrent preterm birth. This study suggests that machine learning has great potential to improve medical care during pregnancy.
Background Identifying pregnancies at risk for preterm birth, one of the leading causes of worldwide infant mortality, has the potential to improve prenatal care. However, we lack broadly applicable methods to accurately predict preterm birth risk. The dense longitudinal information present in electronic health records (EHRs) is enabling scalable and cost-efficient risk modeling of many diseases, but EHR resources have been largely untapped in the study of pregnancy. Methods Here, we apply machine learning to diverse data from EHRs with 35,282 deliveries to predict singleton preterm birth. Results We find that machine learning models based on billing codes alone can predict preterm birth risk at various gestational ages (e.g., ROC-AUC = 0.75, PR-AUC = 0.40 at 28 weeks of gestation) and outperform comparable models trained using known risk factors (e.g., ROC-AUC = 0.65, PR-AUC = 0.25 at 28 weeks). Examining the patterns learned by the model reveals it stratifies deliveries into interpretable groups, including high-risk preterm birth subtypes enriched for distinct comorbidities. Our machine learning approach also predicts preterm birth subtypes (spontaneous vs. indicated), mode of delivery, and recurrent preterm birth. Finally, we demonstrate the portability of our approach by showing that the prediction models maintain their accuracy on a large, independent cohort (5978 deliveries) from a different healthcare system. Conclusions By leveraging rich phenotypic and genetic features derived from EHRs, we suggest that machine learning algorithms have great potential to improve medical care during pregnancy. However, further work is needed before these models can be applied in clinical settings.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available