☆ 4.6 Article

Adjusting for selection bias due to missing data in electronic health records-based research

STATISTICAL METHODS IN MEDICAL RESEARCH (2021)

Journal

STATISTICAL METHODS IN MEDICAL RESEARCH

Volume 30, Issue 10, Pages 2221-2238

Publisher

SAGE PUBLICATIONS LTD

DOI: 10.1177/09622802211027601

Keywords

Electronic health records; inverse probability weighting; missing data; selection bias

Funding

National Institutes of Health [R-01 DK105960, R-01 CA183854]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This study proposes a new framework for researching electronic health records data to better estimate and infer regression models, addressing the selection bias caused by incomplete/missing data. Simulation results show that the proposed methods perform well in small-sample properties, however, researchers need to balance bias and variance when handling missing data.

While electronic health records data provide unique opportunities for research, numerous methodological issues must be considered. Among these, selection bias due to incomplete/missing data has received far less attention than other issues. Unfortunately, standard missing data approaches (e.g. inverse-probability weighting and multiple imputation) generally fail to acknowledge the complex interplay of heterogeneous decisions made by patients, providers, and health systems that govern whether specific data elements in the electronic health records are observed. This, in turn, renders the missing-at-random assumption difficult to believe in standard approaches. In the clinical literature, the collection of decisions that gives rise to the observed data is referred to as the data provenance. Building on a recently-proposed framework for modularizing the data provenance, we develop a general and scalable framework for estimation and inference with respect to regression models based on inverse-probability weighting that allows for a hierarchy of missingness mechanisms to better align with the complex nature of electronic health records data. We show that the proposed estimator is consistent and asymptotically Normal, derive the form of the asymptotic variance, and propose two consistent estimators. Simulations show that naive application of standard methods may yield biased point estimates, that the proposed estimators have good small-sample properties, and that researchers may have to contend with a bias-variance trade-off as they consider how to handle missing data. The proposed methods are motivated by an on-going, electronic health records-based study of bariatric surgery.

Adjusting for selection bias due to missing data in electronic health records-based research

Journal

STATISTICAL METHODS IN MEDICAL RESEARCH

Publisher

SAGE PUBLICATIONS LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Adjusting for selection bias due to missing data in electronic health records-based research

Journal

STATISTICAL METHODS IN MEDICAL RESEARCH

Publisher

SAGE PUBLICATIONS LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper