☆ 4.6 Article

Scalable Feature Engineering from Electronic Free Text Notes to Supplement Confounding Adjustment of Claims-Based Pharmacoepidemiologic Studies

CLINICAL PHARMACOLOGY & THERAPEUTICS (2023)

Journal

CLINICAL PHARMACOLOGY & THERAPEUTICS

Volume 113, Issue 4, Pages 832-838

Publisher

WILEY

DOI: 10.1002/cpt.2826

Keywords

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Natural language processing (NLP) tools are applied to convert free-text notes (FTNs) from electronic health records (EHRs) into data features that can enhance confounding adjustment in pharmacoepidemiologic studies. In this study, unsupervised NLP was utilized to generate high-dimensional feature spaces from FTNs, improving drug exposure and outcome prediction compared to claims-based analyses. These findings have important implications for improving confounding adjustment in pharmacoepidemiologic studies using EHR data.

Natural language processing (NLP) tools turn free-text notes (FTNs) from electronic health records (EHRs) into data features that can supplement confounding adjustment in pharmacoepidemiologic studies. However, current applications are difficult to scale. We used unsupervised NLP to generate high-dimensional feature spaces from FTNs to improve prediction of drug exposure and outcomes compared with claims-based analyses. We linked Medicare claims with EHR data to generate three cohort studies comparing different classes of medications on the risk of various clinical outcomes. We used bag-of-words to generate features for the top 20,000 most prevalent terms from FTNs. We compared machine learning (ML) prediction algorithms using different sets of candidate predictors: Set1 (39 researcher-specified variables), Set2 (Set1 + ML-selected claims codes), and Set3 (Set1 + ML-selected NLP-generated features), vs. Set4 (Set1 + 2 + 3). When modeling treatment choice, we observed a consistent pattern across the examples: ML models utilizing Set4 performed best followed by Set2, Set3, then Set1. When modeling the outcome risk, there was little to no improvement beyond models based on Set1. Supplementing claims data with NLP-generated features from free text notes improved prediction of prescribing choices but had little or no improvement on clinical risk prediction. These findings have implications for strategies to improve confounding using EHR data in pharmacoepidemiologic studies.

Scalable Feature Engineering from Electronic Free Text Notes to Supplement Confounding Adjustment of Claims-Based Pharmacoepidemiologic Studies

Journal

CLINICAL PHARMACOLOGY & THERAPEUTICS

Publisher

WILEY

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Scalable Feature Engineering from Electronic Free Text Notes to Supplement Confounding Adjustment of Claims-Based Pharmacoepidemiologic Studies

Journal

CLINICAL PHARMACOLOGY & THERAPEUTICS

Publisher

WILEY

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper