4.5 Article

Missing value imputation using a fuzzy clustering-based EM approach

Journal

KNOWLEDGE AND INFORMATION SYSTEMS
Volume 46, Issue 2, Pages 389-422

Publisher

SPRINGER LONDON LTD
DOI: 10.1007/s10115-015-0822-y

Keywords

Data preprocessing; Data cleansing; Data quality; Missing value imputation; Fuzzy clustering

Ask authors/readers for more resources

Data preprocessing and cleansing play a vital role in data mining by ensuring good quality of data. Data-cleansing tasks include imputation of missing values, identification of outliers, and identification and correction of noisy data. In this paper, we present a novel technique called A Fuzzy Expectation Maximization and Fuzzy Clustering-based Missing Value Imputation Framework for Data Pre-processing (FEMI). It imputes numerical and categorical missing values by making an educated guess based on records that are similar to the record having a missing value. While identifying a group of similar records and making a guess based on the group, it applies a fuzzy clustering approach and our novel fuzzy expectation maximization algorithm. We evaluate FEMI on eight publicly available natural data sets by comparing its performance with the performance of five high-quality existing techniques, namely EMI, GkNN, FKMI, SVR and IBLLS. We use thirty-two types (patterns) of missing values for each data set. Two evaluation criteria namely root mean squared error and mean absolute error are used. Our experimental results indicate (according to a confidence interval and test analysis) that FEMI performs significantly better than EMI, GkNN, FKMI, SVR, and IBLLS.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available