☆ 4.5 Article

An Improved Mean Imputation Clustering Algorithm for Incomplete Data

NEURAL PROCESSING LETTERS (2022)

Journal

NEURAL PROCESSING LETTERS

Volume 54, Issue 5, Pages 3537-3550

Publisher

SPRINGER

DOI: 10.1007/s11063-020-10298-5

Keywords

Incomplete data; Mean imputation; K-means; Validity index

Funding

National Natural Science Foundation of China [61503160, 61773012]
Natural Science Foundation of the Jiangsu Higher Education Institutions of China [15KJB110004]
Postgraduate Research & Practice Innovation Program of Jiangsu Province [KYCX19_1699]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This study proposes an improved clustering algorithm for handling incomplete data sets caused by random noise, data loss, data acquisition limitations, etc. The algorithm divides the data set into two sets based on missing and non-missing values, and uses mean imputation to fill the missing attribute values. Experimental results demonstrate the effectiveness of the algorithm in clustering.

There are many incomplete data sets in all fields of scientific studies due to random noise, data lost, limitations of data acquisition, data misunderstanding etc. Most of the clustering algorithms can not be used for incomplete data sets directly because objects with missing values need to be preprocessed. For this reason, this paper presents an improved mean imputation clustering algorithm for incomplete data based on partition clustering algorithm. In the proposed method, we divide the universe into two sets: the set of objects with non-missing values and the set of objects with missing values. Firstly, the objects with non-missing values are clustered by traditional clustering algorithm. For each object with missing values, we use the mean attribute's value of each cluster to fill the missing attribute's value based on the cluster results of the objects with non-missing values, respectively. Perturbation analysis of cluster centroid is applied to search the optimal imputation. The experimental clustering results on some UCI data sets are evaluated by several validity indexes, which proves the effectiveness of the proposed algorithm.

An Improved Mean Imputation Clustering Algorithm for Incomplete Data

Journal

NEURAL PROCESSING LETTERS

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

An Improved Mean Imputation Clustering Algorithm for Incomplete Data

Journal

NEURAL PROCESSING LETTERS

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper