☆ 4.3 Article

Adapting Nearest Neighbor for Multiple Imputation: Advantages, Challenges, and Drawbacks

JOURNAL OF SURVEY STATISTICS AND METHODOLOGY (2023)

期刊

JOURNAL OF SURVEY STATISTICS AND METHODOLOGY

卷 11, 期 1, 页码 213-233

出版社

OXFORD UNIV PRESS INC

DOI: 10.1093/jssam/smab058

关键词

Approximate Bayesian bootstrap; Finite population; Multiple imputation; Nearest neighbor

类别

Social Sciences, Mathematical Methods Statistics & Probability

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The U.S. Census Bureau historically used nearest neighbor or random hot deck imputation to handle missing survey data. Nearest neighbor imputation is preferred for its precision and robustness against misspecified response mechanisms, while random hot deck imputation ignores auxiliary information. K-nearest neighbor imputation is a compromise between the two methods. However, using the Approximate Bayesian Bootstrap method for nearest neighbor imputation leads to variance underestimation.

The U.S. Census Bureau has historically used nearest neighbor (NN) or random hot deck (RHD) imputation to handle missing data for many types of survey data. Using these methods removes the need to parametrically model values in imputation models. With strong auxiliary information, NN imputation is preferred because it produces more precise estimates than RHD. In addition, NN imputation is robust against a misspecified response mechanism if missingness depends on the auxiliary variable, in contrast to RHD which ignores the auxiliary information. A compromise between these two methods is k-NN imputation, which identifies a set of the k closest neighbors (donor pool) and randomly selects a single donor from this set. Recently these methods have been used for multiple imputation (MI), enabling variance estimation via the so-called Rubin's Combining Rules. The Approximate Bayesian Bootstrap (ABB) is a simple-to-implement algorithm that makes the RHD proper for MI. In concept, ABB should work to propagate uncertainty for NN MI; bootstrapping respondents mean each nonrespondent's one nearest donor will not be available for every imputation. However, we demonstrate through simulation that NN MI using ABB leads to variance underestimation. This underestimation is somewhat but not entirely attenuated with k-NN imputation. An alternative approach to variance estimation after MI, bootstrapped MI, eliminates the underestimation with NN imputation, but we show that it suffers from overestimation of variance with nonnegligible sampling fractions under both equal and unequal probability sampling designs. We propose a modification to bootstrapped MI to account for nonnegligible sampling fractions. We compare the performance of RHD and the various NN MI methods under a variety of sampling designs, sampling fractions, distribution shapes, and missingness mechanisms.

Adapting Nearest Neighbor for Multiple Imputation: Advantages, Challenges, and Drawbacks

期刊

JOURNAL OF SURVEY STATISTICS AND METHODOLOGY

出版社

OXFORD UNIV PRESS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Adapting Nearest Neighbor for Multiple Imputation: Advantages, Challenges, and Drawbacks

期刊

JOURNAL OF SURVEY STATISTICS AND METHODOLOGY

出版社

OXFORD UNIV PRESS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文