4.5 Article

An Improved Mean Imputation Clustering Algorithm for Incomplete Data

期刊

NEURAL PROCESSING LETTERS
卷 54, 期 5, 页码 3537-3550

出版社

SPRINGER
DOI: 10.1007/s11063-020-10298-5

关键词

Incomplete data; Mean imputation; K-means; Validity index

资金

  1. National Natural Science Foundation of China [61503160, 61773012]
  2. Natural Science Foundation of the Jiangsu Higher Education Institutions of China [15KJB110004]
  3. Postgraduate Research & Practice Innovation Program of Jiangsu Province [KYCX19_1699]

向作者/读者索取更多资源

This study proposes an improved clustering algorithm for handling incomplete data sets caused by random noise, data loss, data acquisition limitations, etc. The algorithm divides the data set into two sets based on missing and non-missing values, and uses mean imputation to fill the missing attribute values. Experimental results demonstrate the effectiveness of the algorithm in clustering.
There are many incomplete data sets in all fields of scientific studies due to random noise, data lost, limitations of data acquisition, data misunderstanding etc. Most of the clustering algorithms can not be used for incomplete data sets directly because objects with missing values need to be preprocessed. For this reason, this paper presents an improved mean imputation clustering algorithm for incomplete data based on partition clustering algorithm. In the proposed method, we divide the universe into two sets: the set of objects with non-missing values and the set of objects with missing values. Firstly, the objects with non-missing values are clustered by traditional clustering algorithm. For each object with missing values, we use the mean attribute's value of each cluster to fill the missing attribute's value based on the cluster results of the objects with non-missing values, respectively. Perturbation analysis of cluster centroid is applied to search the optimal imputation. The experimental clustering results on some UCI data sets are evaluated by several validity indexes, which proves the effectiveness of the proposed algorithm.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据