☆ 4.6 Article

Missing data matter: an empirical evaluation of the impacts of missing EHR data in comparative effectiveness research

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION (2023)

Journal

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION

Volume 30, Issue 7, Pages 1246-1256

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/jamia/ocad066

Keywords

electronic health records; empirical study; missing data; multiple imputation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This study aimed to quantify the impacts of missing data in comparative effectiveness research (CER) using electronic health records (EHRs) and compare the performance of different imputation methods. Results showed that the spline smoothing method produced results close to those without missing data when the missing data depended on the stochastic progression of disease and medical practice patterns. Compared to multiple imputation, spline smoothing generally performed similarly or better, with smaller estimation bias and less power loss. Therefore, leveraging the temporal information of disease trajectory to impute missing values and considering the missing rate and effect size when choosing an imputation method are important when using EHRs for CER.

Objectives: The impacts of missing data in comparative effectiveness research (CER) using electronic health records (EHRs) may vary depending on the type and pattern of missing data. In this study, we aimed to quantify these impacts and compare the performance of different imputation methods. Materials and Methods: We conducted an empirical (simulation) study to quantify the bias and power loss in estimating treatment effects in CER using EHR data. We considered various missing scenarios and used the propensity scores to control for confounding. We compared the performance of the multiple imputation and spline smoothing methods to handle missing data. Results: When missing data depended on the stochastic progression of disease and medical practice patterns, the spline smoothing method produced results that were close to those obtained when there were no missing data. Compared to multiple imputation, the spline smoothing generally performed similarly or better, with smaller estimation bias and less power loss. The multiple imputation can still reduce study bias and power loss in some restrictive scenarios, eg, when missing data did not depend on the stochastic process of disease progression. Discussion and Conclusion: Missing data in EHRs could lead to biased estimates of treatment effects and false negative findings in CER even after missing data were imputed. It is important to leverage the temporal information of disease trajectory to impute missing values when using EHRs as a data resource for CER and to consider the missing rate and the effect size when choosing an imputation method.

Missing data matter: an empirical evaluation of the impacts of missing EHR data in comparative effectiveness research

Journal

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Missing data matter: an empirical evaluation of the impacts of missing EHR data in comparative effectiveness research

Journal

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper