4.6 Article

Revisiting agglomerative clustering

出版社

ELSEVIER
DOI: 10.1016/j.physa.2021.126433

关键词

Clustering; Hierarchical clustering; Agglomerative clustering; False positive

资金

  1. FAPESP, Brazil [2019/01077-3, 18/09125-4, 15/22308-2]
  2. CNPq, Brazil [307085/2018-0]
  3. Fundacao de Amparo a Pesquisa do Estado de Sao Paulo (FAPESP) [15/22308-2] Funding Source: FAPESP

向作者/读者索取更多资源

Hierarchical agglomerative methods are effective and popular for clustering data, but have not been systematically compared regarding false positives when searching for clusters. A cluster model involving a higher density nucleus, transition, and outliers is used to quantify the relevance of obtained clusters and address false positive issues. Experiment results show many methods detecting two clusters in unimodal data, with single-linkage method being more resilient to false positives.
Hierarchical agglomerative methods stand out as particularly effective and popular approaches for clustering data. Yet, these methods have not been systematically compared regarding the important issue of false positives while searching for clusters. A model of clusters involving a higher density nucleus surrounded by a transition, followed by outliers is adopted as a means to quantify the relevance of the obtained clusters and address the problem of false positives. Six traditional methodologies, namely the single, average, median, complete, centroid and Ward's linkage criteria are compared with respect to the adopted model. Unimodal and bimodal datasets obeying uniform, gaussian, exponential and power-law distributions are considered for this comparison. The obtained results include the verification that many methods detect two clusters in unimodal data. The single-linkage method was found to be more resilient to false positives. Also, several methods detected clusters not corresponding directly to the nucleus. (C) 2021 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据