4.5 Article

An Improved Mean Imputation Clustering Algorithm for Incomplete Data

Journal

NEURAL PROCESSING LETTERS
Volume 54, Issue 5, Pages 3537-3550

Publisher

SPRINGER
DOI: 10.1007/s11063-020-10298-5

Keywords

Incomplete data; Mean imputation; K-means; Validity index

Funding

  1. National Natural Science Foundation of China [61503160, 61773012]
  2. Natural Science Foundation of the Jiangsu Higher Education Institutions of China [15KJB110004]
  3. Postgraduate Research & Practice Innovation Program of Jiangsu Province [KYCX19_1699]

Ask authors/readers for more resources

This study proposes an improved clustering algorithm for handling incomplete data sets caused by random noise, data loss, data acquisition limitations, etc. The algorithm divides the data set into two sets based on missing and non-missing values, and uses mean imputation to fill the missing attribute values. Experimental results demonstrate the effectiveness of the algorithm in clustering.
There are many incomplete data sets in all fields of scientific studies due to random noise, data lost, limitations of data acquisition, data misunderstanding etc. Most of the clustering algorithms can not be used for incomplete data sets directly because objects with missing values need to be preprocessed. For this reason, this paper presents an improved mean imputation clustering algorithm for incomplete data based on partition clustering algorithm. In the proposed method, we divide the universe into two sets: the set of objects with non-missing values and the set of objects with missing values. Firstly, the objects with non-missing values are clustered by traditional clustering algorithm. For each object with missing values, we use the mean attribute's value of each cluster to fill the missing attribute's value based on the cluster results of the objects with non-missing values, respectively. Perturbation analysis of cluster centroid is applied to search the optimal imputation. The experimental clustering results on some UCI data sets are evaluated by several validity indexes, which proves the effectiveness of the proposed algorithm.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available