4.6 Article

Can the Number of Clusters Be Determined by External Indices?

期刊

IEEE ACCESS
卷 8, 期 -, 页码 89239-89257

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2020.2993295

关键词

Clustering algorithms; Indexes; Stability criteria; Linear programming; Task analysis; Clustering; cluster validation; stability; number of clusters; external index; resampling

向作者/读者索取更多资源

External indices have been used in the literature for determining the number of clusters. The idea is to measure the stability of clustering results using an external validity index when adding randomness to the clustering process. The hypothesis is that the clustering results are more stable when the correct number of clusters is used. The goal of this paper is to provide an answer to the research question stated in the paper title. We conduct a systematic study of the main components of the stability-based approach. We will discuss how to add randomness to the process, how to perform the cross-validation, and which external index to use. We will show that the number of clusters can be reliably determined only when the type of clusters is known and all the components of the approach are carefully chosen. Inferior algorithms like $k$ -means, too high or low subsampling rate, null reference for normalization, and ineffective validation indices can all cause the stability-based approach to break. We recommend better design choices for all these components, which leads to better results compared to existing stability-based methods. However, even with the best choices, there are pathological cases where the stability-based method fails.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据