☆ 4.5 Article

Missing value imputation using a fuzzy clustering-based EM approach

KNOWLEDGE AND INFORMATION SYSTEMS (2016)

Journal

KNOWLEDGE AND INFORMATION SYSTEMS

Volume 46, Issue 2, Pages 389-422

Publisher

SPRINGER LONDON LTD

DOI: 10.1007/s10115-015-0822-y

Keywords

Data preprocessing; Data cleansing; Data quality; Missing value imputation; Fuzzy clustering

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Data preprocessing and cleansing play a vital role in data mining by ensuring good quality of data. Data-cleansing tasks include imputation of missing values, identification of outliers, and identification and correction of noisy data. In this paper, we present a novel technique called A Fuzzy Expectation Maximization and Fuzzy Clustering-based Missing Value Imputation Framework for Data Pre-processing (FEMI). It imputes numerical and categorical missing values by making an educated guess based on records that are similar to the record having a missing value. While identifying a group of similar records and making a guess based on the group, it applies a fuzzy clustering approach and our novel fuzzy expectation maximization algorithm. We evaluate FEMI on eight publicly available natural data sets by comparing its performance with the performance of five high-quality existing techniques, namely EMI, GkNN, FKMI, SVR and IBLLS. We use thirty-two types (patterns) of missing values for each data set. Two evaluation criteria namely root mean squared error and mean absolute error are used. Our experimental results indicate (according to a confidence interval and test analysis) that FEMI performs significantly better than EMI, GkNN, FKMI, SVR, and IBLLS.

Missing value imputation using a fuzzy clustering-based EM approach

Journal

KNOWLEDGE AND INFORMATION SYSTEMS

Publisher

SPRINGER LONDON LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Missing value imputation using a fuzzy clustering-based EM approach

Journal

KNOWLEDGE AND INFORMATION SYSTEMS

Publisher

SPRINGER LONDON LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper