☆ 4.5 Article

An analysis of four missing data treatment methods for supervised learning

APPLIED ARTIFICIAL INTELLIGENCE (2003)

Journal

APPLIED ARTIFICIAL INTELLIGENCE

Volume 17, Issue 5-6, Pages 519-533

Publisher

TAYLOR & FRANCIS INC

DOI: 10.1080/713827181

Keywords

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

One relevant problem in data quality is missing data. Despite the frequent occurrence and the relevance of the missing data problem, many machine learning algorithms handle missing data in a rather naive way. However, missing data treatment should be carefully treated, otherwise bias might be introduced into the knowledge induced. In this work, we analyze the use of the k-nearest neighbor as an imputation method. Imputation is a term that denotes a procedure that replaces the missing values in a data set with some plausible values. One advantage of this approach is that the missing data treatment is independent of the learning algorithm used. This allows the user to select the most suitable imputation method for each situation. Our analysis indicates that missing data imputation based on the k-nearest neighbor algorithm can outperform the internal methods used by C4.5 and CN2 to treat missing data, and can also outperform the mean or mode imputation method, which is a method broadly used to treat missing values.

An analysis of four missing data treatment methods for supervised learning

Journal

APPLIED ARTIFICIAL INTELLIGENCE

Publisher

TAYLOR & FRANCIS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

An analysis of four missing data treatment methods for supervised learning

Journal

APPLIED ARTIFICIAL INTELLIGENCE

Publisher

TAYLOR & FRANCIS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper