4.7 Article

Overcoming weaknesses of density peak clustering using a data-dependent similarity measure

期刊

PATTERN RECOGNITION
卷 137, 期 -, 页码 -

出版社

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2022.109287

关键词

Clustering; Density peak clustering; Similarity measure; Data-dependent similarity

向作者/读者索取更多资源

Density Peak Clustering (DPC) is a popular clustering algorithm that uses pairwise similarity to detect arbitrary shaped clusters. However, it is not robust for datasets with different densities and is sensitive to scale changes in data representation. This paper proposes an effective data-dependent similarity measure called MP-Similarity, and integrates it into DPC to create MP-DPC. The experiments show that MP-DPC outperforms DPC with Euclidean distance and existing similarity measures, and is robust to changes in data scales.
Density Peak Clustering (DPC) is a popular state-of-the-art clustering algorithm, which requires pairwise (dis)similarity of data objects to detect arbitrary shaped clusters. While it is shown to perform well for many applications, DPC remains: (i) not robust for datasets with clusters having different densities, and (ii) sensitive to the change in the units/scales used to represent data. These drawbacks are mainly due to the use of the data-independent similarity measure based on the Euclidean distance. In this paper, we ad-dress these issues by proposing an effective data-dependent similarity measure based on Probability Mass , which we call MP-Similarity , and by incorporating it in DPC to create MP-DPC, a data-dependent variant of DPC. We evaluate and compare MP-DPC against diverse baselines using several clustering metrics and datasets. Our experiments demonstrate that: (a) MP-DPC produces better clustering results than DPC us-ing the Euclidean distance and existing data-dependent similarity measures; (b) MP-Similarity coupled with Shared-Nearest-Neighbor-based density metric in DPC further enhances the quality of clustering results; and (c) unlike DPC with existing data-independent and data-dependent similarity measures, MP-DPC is robust to the change in the units/scales used to represent data. Our findings suggest that MP -Similarity provides a more viable solution for DPC in datasets with unknown distribution or units/scales of features, which is often the case in many real-world applications.(c) 2022 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据