4.6 Article

Missing data matter: an empirical evaluation of the impacts of missing EHR data in comparative effectiveness research

Journal

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/jamia/ocad066

Keywords

electronic health records; empirical study; missing data; multiple imputation

Ask authors/readers for more resources

This study aimed to quantify the impacts of missing data in comparative effectiveness research (CER) using electronic health records (EHRs) and compare the performance of different imputation methods. Results showed that the spline smoothing method produced results close to those without missing data when the missing data depended on the stochastic progression of disease and medical practice patterns. Compared to multiple imputation, spline smoothing generally performed similarly or better, with smaller estimation bias and less power loss. Therefore, leveraging the temporal information of disease trajectory to impute missing values and considering the missing rate and effect size when choosing an imputation method are important when using EHRs for CER.
Objectives: The impacts of missing data in comparative effectiveness research (CER) using electronic health records (EHRs) may vary depending on the type and pattern of missing data. In this study, we aimed to quantify these impacts and compare the performance of different imputation methods. Materials and Methods: We conducted an empirical (simulation) study to quantify the bias and power loss in estimating treatment effects in CER using EHR data. We considered various missing scenarios and used the propensity scores to control for confounding. We compared the performance of the multiple imputation and spline smoothing methods to handle missing data. Results: When missing data depended on the stochastic progression of disease and medical practice patterns, the spline smoothing method produced results that were close to those obtained when there were no missing data. Compared to multiple imputation, the spline smoothing generally performed similarly or better, with smaller estimation bias and less power loss. The multiple imputation can still reduce study bias and power loss in some restrictive scenarios, eg, when missing data did not depend on the stochastic process of disease progression. Discussion and Conclusion: Missing data in EHRs could lead to biased estimates of treatment effects and false negative findings in CER even after missing data were imputed. It is important to leverage the temporal information of disease trajectory to impute missing values when using EHRs as a data resource for CER and to consider the missing rate and the effect size when choosing an imputation method.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available