☆ 4.5 Article

Implementing Multiple Imputation for Missing Data in Longitudinal Studies When Models are Not Feasible: An Example Using the Random Hot Deck Approach

CLINICAL EPIDEMIOLOGY (2022)

Journal

CLINICAL EPIDEMIOLOGY

Volume 14, Issue -, Pages 1387-1403

Publisher

DOVE MEDICAL PRESS LTD

DOI: 10.2147/CLEP.S368303

Keywords

multiple imputation; missing data; missing at random; hot deck imputation; random hot deck imputation; longitudinal; studies

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Researchers demonstrated that random hot deck imputation can achieve plausible multiple imputation in longitudinal studies, serving as an alternative method when model-based approaches are infeasible.

Purpose: Researchers often use model-based multiple imputation to handle missing at random data to minimize bias. However, constraints within the data may sometimes result in implausible values, making model-based imputation infeasible. In these contexts, we illustrate how random hot deck imputation can allow for plausible multiple imputation in longitudinal studies.Patients and Methods: Our motivating example is the Childhood Health, Activity, and Motor Performance School Study Denmark (CHAMPS-DK), a prospective cohort study that measured weekly sports participation for 1700 Danish schoolchildren. Using observed data on 4 variables (pain, activity frequency, sport, sport counts), we created a gold-standard data set without missing data. We then created a synthetic data set by setting some variable values to missing based on a prediction model that mimicked real-data missingness patterns. To create 5 imputed data sets, we matched each record with missing data to several fully observed records, generated probabilities from matched records, and sampled from these records based on the probability of each occurring. We assessed variability and agreement (kappa) between the imputed data sets and the gold-standard data set. We compare results to common model-based imputation methods.Results: Variability across data sets appeared reasonable. The range of kappa for the random hot deck approach was moderate for activity frequency (0.65 to 0.71) and sport (0.59 to 0.85), and poor for common model-based approaches (range 0.00 to 0.11). The range of kappas for sport count was strong (0.87 to 0.97) for random hot deck imputation and weak to moderate (0.55 to 0.71) for common model-based imputation. Agreement was higher when more information was present, and when prevalence was higher for our binary variable sport.Conclusion: Random hot deck imputation should be considered as an alternative method when model-based approaches are infeasible, specifically where there are constraints within and between covariates.

Implementing Multiple Imputation for Missing Data in Longitudinal Studies When Models are Not Feasible: An Example Using the Random Hot Deck Approach

Journal

CLINICAL EPIDEMIOLOGY

Publisher

DOVE MEDICAL PRESS LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Implementing Multiple Imputation for Missing Data in Longitudinal Studies When Models are Not Feasible: An Example Using the Random Hot Deck Approach

Journal

CLINICAL EPIDEMIOLOGY

Publisher

DOVE MEDICAL PRESS LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper