☆ 4.5 Article

Nearest neighbor selection for iteratively kNN imputation

JOURNAL OF SYSTEMS AND SOFTWARE (2012)

Journal

JOURNAL OF SYSTEMS AND SOFTWARE

Volume 85, Issue 11, Pages 2541-2552

Publisher

ELSEVIER SCIENCE INC

DOI: 10.1016/j.jss.2012.05.073

Keywords

Missing data; k nearest neighbors; kNN imputation

Funding

Australian Research Council (ARC) [DP0985456]
Nature Science Foundation (NSF) of China [61170131]
China 1000-Plan National Distinguished Professorship
China 863 Program [SQ2011AAJY2742]
Guangxi Natural Science Foundation [2012GXNSFGA060004]
Guangxi Bagui Teams for Innovation and Research
Jiangsu Provincial Key Laboratory of E-business at the Nanjing University of Finance and Economics Research Exchanges with China/India Award, The Royal Academy of Engineering

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Existing kNN imputation methods for dealing with missing data are designed according to Minkowski distance or its variants, and have been shown to be generally efficient for numerical variables (features, or attributes). To deal with heterogeneous (i.e., mixed-attributes) data, we propose a novel kNN (k nearest neighbor) imputation method to iteratively imputing missing data, named GkNN (gray kNN) imputation. GkNN selects k nearest neighbors for each missing datum via calculating the gray distance between the missing datum and all the training data rather than traditional distance metric methods, such as Euclidean distance. Such a distance metric can deal with both numerical and categorical attributes. For achieving the better effectiveness, GkNN regards all the imputed instances (i.e., the missing data been imputed) as observed data, which with complete instances (instances without missing values) together to iteratively impute other missing data. We experimentally evaluate the proposed approach, and demonstrate that the gray distance is much better than the Minkowski distance at both capturing the proximity relationship (or nearness) of two instances and dealing with mixed attributes. Moreover, experimental results also show that the GkNN algorithm is much more efficient than existent kNN imputation methods. (c) 2012 Elsevier Inc. All rights reserved.

Nearest neighbor selection for iteratively kNN imputation

Journal

JOURNAL OF SYSTEMS AND SOFTWARE

Publisher

ELSEVIER SCIENCE INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Nearest neighbor selection for iteratively kNN imputation

Journal

JOURNAL OF SYSTEMS AND SOFTWARE

Publisher

ELSEVIER SCIENCE INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper