4.6 Article

Can the Number of Clusters Be Determined by External Indices?

Journal

IEEE ACCESS
Volume 8, Issue -, Pages 89239-89257

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2020.2993295

Keywords

Clustering algorithms; Indexes; Stability criteria; Linear programming; Task analysis; Clustering; cluster validation; stability; number of clusters; external index; resampling

Ask authors/readers for more resources

External indices have been used in the literature for determining the number of clusters. The idea is to measure the stability of clustering results using an external validity index when adding randomness to the clustering process. The hypothesis is that the clustering results are more stable when the correct number of clusters is used. The goal of this paper is to provide an answer to the research question stated in the paper title. We conduct a systematic study of the main components of the stability-based approach. We will discuss how to add randomness to the process, how to perform the cross-validation, and which external index to use. We will show that the number of clusters can be reliably determined only when the type of clusters is known and all the components of the approach are carefully chosen. Inferior algorithms like $k$ -means, too high or low subsampling rate, null reference for normalization, and ineffective validation indices can all cause the stability-based approach to break. We recommend better design choices for all these components, which leads to better results compared to existing stability-based methods. However, even with the best choices, there are pathological cases where the stability-based method fails.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available