☆ 4.5 Article

Missing value imputation using a fuzzy clustering-based EM approach

KNOWLEDGE AND INFORMATION SYSTEMS (2016)

期刊

KNOWLEDGE AND INFORMATION SYSTEMS

卷 46, 期 2, 页码 389-422

出版社

SPRINGER LONDON LTD

DOI: 10.1007/s10115-015-0822-y

关键词

Data preprocessing; Data cleansing; Data quality; Missing value imputation; Fuzzy clustering

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Data preprocessing and cleansing play a vital role in data mining by ensuring good quality of data. Data-cleansing tasks include imputation of missing values, identification of outliers, and identification and correction of noisy data. In this paper, we present a novel technique called A Fuzzy Expectation Maximization and Fuzzy Clustering-based Missing Value Imputation Framework for Data Pre-processing (FEMI). It imputes numerical and categorical missing values by making an educated guess based on records that are similar to the record having a missing value. While identifying a group of similar records and making a guess based on the group, it applies a fuzzy clustering approach and our novel fuzzy expectation maximization algorithm. We evaluate FEMI on eight publicly available natural data sets by comparing its performance with the performance of five high-quality existing techniques, namely EMI, GkNN, FKMI, SVR and IBLLS. We use thirty-two types (patterns) of missing values for each data set. Two evaluation criteria namely root mean squared error and mean absolute error are used. Our experimental results indicate (according to a confidence interval and test analysis) that FEMI performs significantly better than EMI, GkNN, FKMI, SVR, and IBLLS.

Missing value imputation using a fuzzy clustering-based EM approach

期刊

KNOWLEDGE AND INFORMATION SYSTEMS

出版社

SPRINGER LONDON LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Missing value imputation using a fuzzy clustering-based EM approach

期刊

KNOWLEDGE AND INFORMATION SYSTEMS

出版社

SPRINGER LONDON LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文