☆ 4.2 Article

Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads

JOURNAL OF CLASSIFICATION (2010)

Journal

JOURNAL OF CLASSIFICATION

Volume 27, Issue 1, Pages 3-40

Publisher

SPRINGER

DOI: 10.1007/s00357-010-9049-5

Keywords

K-Means clustering; Number of clusters; Anomalous pattern; Hartigan's rule; Gap statistic

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The issue of determining the right number of clusters in K-Means has attracted considerable interest, especially in the recent years. Cluster intermix appears to be a factor most affecting the clustering results. This paper proposes an experimental setting for comparison of different approaches at data generated from Gaussian clusters with the controlled parameters of between- and within-cluster spread to model cluster intermix. The setting allows for evaluating the centroid recovery on par with conventional evaluation of the cluster recovery. The subjects of our interest are two versions of the intelligent K-Means method, ik-Means, that find the right number of clusters by extracting anomalous patterns from the data one-by-one. We compare them with seven other methods, including Hartigan's rule, averaged Silhouette width and Gap statistic, under different between- and within-cluster spread-shape conditions. There are several consistent patterns in the results of our experiments, such as that the right K is reproduced best by Hartigan's rule - but not clusters or their centroids. This leads us to propose an adjusted version of iK-Means, which performs well in the current experiment setting.

Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads

Journal

JOURNAL OF CLASSIFICATION

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads

Journal

JOURNAL OF CLASSIFICATION

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper