4.5 Article

Improved K-means algorithm for clustering non-spherical data

Journal

EXPERT SYSTEMS
Volume 39, Issue 9, Pages -

Publisher

WILEY
DOI: 10.1111/exsy.13062

Keywords

arbitrary shape; clustering; improved algorithm; K-means; non-spherical

Funding

  1. Guangdong Edu Science Project Plan [2021GXJK513]
  2. Lianyungang High-tech Zone Science and Technology Project Plan [ZD201915]
  3. Lianyungang Technical College Project Plan [XZD202001]
  4. Shenzhen Edu Science Project Plan [DWZZ19002]

Ask authors/readers for more resources

In this study, an improved K-means algorithm (IK-means) is proposed to enhance clustering efficiency for non-spherical data. By clustering the original dataset into high-density sub-clusters and merging them, IK-means algorithm shows good clustering capability for data of arbitrary shape and is faster for larger datasets compared to DBSCAN and KGFCM.
As one of the commonly used data mining algorithms, K-means has the advantage of fast clustering speed, but the disadvantage is that it is less effective for clustering non-spherical data. An improved K-means algorithm (IK-means) is proposed to enhance clustering efficiency for non-spherical data. The original dataset is clustered into a relatively larger number of high-density sub-clusters, and the final result is obtained by merging connected sub-clusters respectively. The connectivity among sub-clusters is evaluated by the sub-clusters density and the nearest distance class between sub-clusters. By testing on University of California, Irvine(UCI) datasets and several other artificial simulation datasets, the comparison of proposed IK-means algorithm against DBSCAN, KGFCM shows its clustering capability for data of arbitrary shape. The clustering Adjusted Rand Index (ARI) value for 72,000 sizes data is 24% higher than DBSCAN, and 95.2% higher than KGFCM. For larger datasets, the IK-means algorithm is faster than DBSCAN and KGFCM.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available