4.7 Article

Distributed Nonparametric Regression Imputation for Missing Response Problems with Large-scale Data

Journal

JOURNAL OF MACHINE LEARNING RESEARCH
Volume 24, Issue -, Pages -

Publisher

MICROTOME PUBL

Keywords

Distributed data; Divide and conquer; Kernel method; Missing data; Sieve method

Ask authors/readers for more resources

Nonparametric regression imputation is commonly used in missing data analysis, but it suffers from the curse of dimension. This paper proposes two distributed nonparametric regression imputation methods, which are evaluated through simulation studies and illustrated in a real data analysis.
Nonparametric regression imputation is commonly used in missing data analysis. However, it suffers from the curse of dimension. The problem can be alleviated by the explosive sample size in the era of big data, while the large-scale data size presents some challenges in the storage of data and the calculation of estimators. These challenges make the classical nonparametric regression imputation methods no longer applicable. This motivates us to develop two distributed nonparametric regression imputation methods. One is based on kernel smoothing and the other on the sieve method. The kernel-based distributed imputation method has extremely low communication cost, and the sieve-based distributed imputation method can accommodate more local machines. The response mean estimation is considered to illustrate the proposed imputation methods. Two distributed nonparametric regression imputation estimators are proposed for the response mean, which are proved to be asymptotically normal with asymptotic variances achieving the semiparametric efficiency bound. The proposed methods are evaluated through simulation studies and illustrated in a real data analysis.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available