4.7 Article

A mixed data clustering algorithm with noise-filtered distribution centroid and iterative weight adjustment strategy

Journal

INFORMATION SCIENCES
Volume 577, Issue -, Pages 697-721

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2021.07.039

Keywords

Mixed data clustering; Noise-filtered distribution centroid; Iterative weight adjustment strategy; Intra-cluster homogeneity; Inter-cluster heterogeneity

Funding

  1. National Natural Science Foundation of China [61862042, 61762062, U1936120]
  2. National Key R&D Program of China [2017YFB0802805, 2017YFB0801701]
  3. Science and Technology Innovation Platform Project of Jiangxi Province [20181BCD40005]
  4. Major Discipline Academic and Technical Leader Training Plan Project of Jiangxi Pro-vince [20172BCB22030]
  5. Primary Research & Development Plan of Jiangxi Province [20192BBE50075, 20192BEL50041, 20181ACE50033]
  6. Jiangxi Province Natural Science Foundation of China [20192BAB207019, 20192BAB207020]
  7. Graduate Innovation Fund Project of Jiangxi Province [YC2019-S100, Y C2019-S048, YC2020-S028, YC2020-S092, YC2020-S083]
  8. Jiangxi Province Educational Reform Key Project [JXJG-2020-1-2]
  9. Jiangxi Double Thousand Plan [JSXQ201901075]
  10. Practice Innovation Training Program of Jiangxi Province for Col-lege Students [20190403041, 20190402125, 2020CX160]

Ask authors/readers for more resources

This paper proposes a mixed data clustering algorithm with noise filtered distribution centroid and iterative weight adjustment strategy, which improves the clustering performance for mixed data.
Clustering is an important technology for data analysis. Cluster analysis for mixed data remains challenging. This paper proposes a mixed data clustering algorithm with noise filtered distribution centroid and iterative weight adjustment strategy. The proposed algorithm defines noise-filtered distribution centroid for categorical attributes. We combine both mean and noise-filtered distribution centroid to represent the cluster center with mixed attributes, the noise-filtered distribution centroid records the frequency of occurrences for each possible value of the categorical attributes in a cluster more accurately. Furthermore, because the noise values are filtered, the measure to calculate the dissimilarity between data objects and cluster centers could be improved. In addition, the algorithm introduces an iterative weight adjustment strategy with combined intra-cluster and inter-cluster information. The unified weight measurement method is used for refining numeric attributes and categorical attributes. Then attributes with higher intra-cluster homogeneity and inter-clusters heterogeneity are considered as attributes with higher priority. They tend to be assigned with relatively heavier weights during clustering. Experimental results on different datasets from the UCI repository show that the MCFCIW algorithm outperforms the existing partition-based clustering algorithm and clustering algorithm based on data conversion for mixed data on both cluster validity indices and convergence speed. (c) 2021 Elsevier Inc. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available