4.3 Article

Adapting Nearest Neighbor for Multiple Imputation: Advantages, Challenges, and Drawbacks

期刊

出版社

OXFORD UNIV PRESS INC
DOI: 10.1093/jssam/smab058

关键词

Approximate Bayesian bootstrap; Finite population; Multiple imputation; Nearest neighbor

向作者/读者索取更多资源

The U.S. Census Bureau historically used nearest neighbor or random hot deck imputation to handle missing survey data. Nearest neighbor imputation is preferred for its precision and robustness against misspecified response mechanisms, while random hot deck imputation ignores auxiliary information. K-nearest neighbor imputation is a compromise between the two methods. However, using the Approximate Bayesian Bootstrap method for nearest neighbor imputation leads to variance underestimation.
The U.S. Census Bureau has historically used nearest neighbor (NN) or random hot deck (RHD) imputation to handle missing data for many types of survey data. Using these methods removes the need to parametrically model values in imputation models. With strong auxiliary information, NN imputation is preferred because it produces more precise estimates than RHD. In addition, NN imputation is robust against a misspecified response mechanism if missingness depends on the auxiliary variable, in contrast to RHD which ignores the auxiliary information. A compromise between these two methods is k-NN imputation, which identifies a set of the k closest neighbors (donor pool) and randomly selects a single donor from this set. Recently these methods have been used for multiple imputation (MI), enabling variance estimation via the so-called Rubin's Combining Rules. The Approximate Bayesian Bootstrap (ABB) is a simple-to-implement algorithm that makes the RHD proper for MI. In concept, ABB should work to propagate uncertainty for NN MI; bootstrapping respondents mean each nonrespondent's one nearest donor will not be available for every imputation. However, we demonstrate through simulation that NN MI using ABB leads to variance underestimation. This underestimation is somewhat but not entirely attenuated with k-NN imputation. An alternative approach to variance estimation after MI, bootstrapped MI, eliminates the underestimation with NN imputation, but we show that it suffers from overestimation of variance with nonnegligible sampling fractions under both equal and unequal probability sampling designs. We propose a modification to bootstrapped MI to account for nonnegligible sampling fractions. We compare the performance of RHD and the various NN MI methods under a variety of sampling designs, sampling fractions, distribution shapes, and missingness mechanisms.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据