☆ 4.5 Article

Selecting the model for multiple imputation of missing data: Just use an IC!

STATISTICS IN MEDICINE (2021)

期刊

STATISTICS IN MEDICINE

卷 40, 期 10, 页码 2467-2497

出版社

WILEY

DOI: 10.1002/sim.8915

关键词

imputation model selection; information criteria; missing data analysis; stochastic EM algorithm

类别

Mathematical & Computational Biology Public, Environmental & Occupational Health Medical Informatics Medicine, Research & Experimental Statistics & Probability

资金

Australian Government Research Training Program Scholarship

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This article demonstrates the equivalence between multiple imputation and stochastic expectation-maximization approximation to likelihood, and proposes using likelihood-based model selection tools to choose the best imputation model. Selecting an appropriate imputation model is crucial to minimizing bias in inference, and the BIC method shows promise in selecting the correct imputation model.

Multiple imputation and maximum likelihood estimation (via the expectation-maximization algorithm) are two well-known methods readily used for analyzing data with missing values. While these two methods are often considered as being distinct from one another, multiple imputation (when using improper imputation) is actually equivalent to a stochastic expectation-maximization approximation to the likelihood. In this article, we exploit this key result to show that familiar likelihood-based approaches to model selection, such as Akaike's information criterion (AIC) and the Bayesian information criterion (BIC), can be used to choose the imputation model that best fits the observed data. Poor choice of imputation model is known to bias inference, and while sensitivity analysis has often been used to explore the implications of different imputation models, we show that the data can be used to choose an appropriate imputation model via conventional model selection tools. We show that BIC can be consistent for selecting the correct imputation model in the presence of missing data. We verify these results empirically through simulation studies, and demonstrate their practicality on two classical missing data examples. An interesting result we saw in simulations was that not only can parameter estimates be biased by misspecifying the imputation model, but also by overfitting the imputation model. This emphasizes the importance of using model selection not just to choose the appropriate type of imputation model, but also to decide on the appropriate level of imputation model complexity.

Selecting the model for multiple imputation of missing data: Just use an IC!

期刊

STATISTICS IN MEDICINE

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Selecting the model for multiple imputation of missing data: Just use an IC!

期刊

STATISTICS IN MEDICINE

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文