4.5 Article

A data preparation framework for cleaning electronic health records and assessing cleaning outcomes for secondary analysis

Journal

INFORMATION SYSTEMS
Volume 111, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.is.2022.102130

Keywords

Healthcare; Electronic health records (EHR); Secondary analysis; Data quality; Data cleaning

Ask authors/readers for more resources

This study proposes a data preparation framework for guiding and validating the cleaning of electronic health record (EHR) data for secondary analysis. The framework includes three core themes: workflow, assessment and cleaning methods, and cleaning evaluation scheme. A case study using data from a large EHR database demonstrates the effectiveness of the framework in organizing and standardizing phases and processes within an EHR data preparation workflow. The cleaning evaluation scheme is particularly effective in validating EHR cleaning methods for handling complex issues in patient demographics, longitudinal EHR attributes, and filtering/imputation cleaning methods.
Even though data preparation constitutes a large proportion of the total effort involved in electronic health record (EHR) based secondary analysis, guidelines for EHR data preparation are still insufficient to date. This study proposes a data preparation framework that can guide and validate the cleaning of EHRs for secondary analysis. The developed framework consists of three core themes-workflow, assessment and cleaning methods, and cleaning evaluation scheme. To illustrate the viability of the proposed framework, we applied it to a hip-fracture readmission scenario using the underlying data extracted from a large EHR database. The case study demonstrated the effectiveness of the proposed framework in organizing and standardizing phases and processes within an EHR data preparation workflow. Furthermore, the cleaning evaluation scheme was found to be effective in validating EHR cleaning methods, especially for those used to handle complex issues that usually appear in patient demographics, longitudinal attributes of EHRs, and the application of filtering and imputation cleaning methods. & COPY; 2022 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available