4.3 Article

Adapting Nearest Neighbor for Multiple Imputation: Advantages, Challenges, and Drawbacks

Journal

JOURNAL OF SURVEY STATISTICS AND METHODOLOGY
Volume 11, Issue 1, Pages 213-233

Publisher

OXFORD UNIV PRESS INC
DOI: 10.1093/jssam/smab058

Keywords

Approximate Bayesian bootstrap; Finite population; Multiple imputation; Nearest neighbor

Ask authors/readers for more resources

The U.S. Census Bureau historically used nearest neighbor or random hot deck imputation to handle missing survey data. Nearest neighbor imputation is preferred for its precision and robustness against misspecified response mechanisms, while random hot deck imputation ignores auxiliary information. K-nearest neighbor imputation is a compromise between the two methods. However, using the Approximate Bayesian Bootstrap method for nearest neighbor imputation leads to variance underestimation.
The U.S. Census Bureau has historically used nearest neighbor (NN) or random hot deck (RHD) imputation to handle missing data for many types of survey data. Using these methods removes the need to parametrically model values in imputation models. With strong auxiliary information, NN imputation is preferred because it produces more precise estimates than RHD. In addition, NN imputation is robust against a misspecified response mechanism if missingness depends on the auxiliary variable, in contrast to RHD which ignores the auxiliary information. A compromise between these two methods is k-NN imputation, which identifies a set of the k closest neighbors (donor pool) and randomly selects a single donor from this set. Recently these methods have been used for multiple imputation (MI), enabling variance estimation via the so-called Rubin's Combining Rules. The Approximate Bayesian Bootstrap (ABB) is a simple-to-implement algorithm that makes the RHD proper for MI. In concept, ABB should work to propagate uncertainty for NN MI; bootstrapping respondents mean each nonrespondent's one nearest donor will not be available for every imputation. However, we demonstrate through simulation that NN MI using ABB leads to variance underestimation. This underestimation is somewhat but not entirely attenuated with k-NN imputation. An alternative approach to variance estimation after MI, bootstrapped MI, eliminates the underestimation with NN imputation, but we show that it suffers from overestimation of variance with nonnegligible sampling fractions under both equal and unequal probability sampling designs. We propose a modification to bootstrapped MI to account for nonnegligible sampling fractions. We compare the performance of RHD and the various NN MI methods under a variety of sampling designs, sampling fractions, distribution shapes, and missingness mechanisms.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available