期刊
PATTERN RECOGNITION
卷 137, 期 -, 页码 -出版社
ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2022.109287
关键词
Clustering; Density peak clustering; Similarity measure; Data-dependent similarity
Density Peak Clustering (DPC) is a popular clustering algorithm that uses pairwise similarity to detect arbitrary shaped clusters. However, it is not robust for datasets with different densities and is sensitive to scale changes in data representation. This paper proposes an effective data-dependent similarity measure called MP-Similarity, and integrates it into DPC to create MP-DPC. The experiments show that MP-DPC outperforms DPC with Euclidean distance and existing similarity measures, and is robust to changes in data scales.
Density Peak Clustering (DPC) is a popular state-of-the-art clustering algorithm, which requires pairwise (dis)similarity of data objects to detect arbitrary shaped clusters. While it is shown to perform well for many applications, DPC remains: (i) not robust for datasets with clusters having different densities, and (ii) sensitive to the change in the units/scales used to represent data. These drawbacks are mainly due to the use of the data-independent similarity measure based on the Euclidean distance. In this paper, we ad-dress these issues by proposing an effective data-dependent similarity measure based on Probability Mass , which we call MP-Similarity , and by incorporating it in DPC to create MP-DPC, a data-dependent variant of DPC. We evaluate and compare MP-DPC against diverse baselines using several clustering metrics and datasets. Our experiments demonstrate that: (a) MP-DPC produces better clustering results than DPC us-ing the Euclidean distance and existing data-dependent similarity measures; (b) MP-Similarity coupled with Shared-Nearest-Neighbor-based density metric in DPC further enhances the quality of clustering results; and (c) unlike DPC with existing data-independent and data-dependent similarity measures, MP-DPC is robust to the change in the units/scales used to represent data. Our findings suggest that MP -Similarity provides a more viable solution for DPC in datasets with unknown distribution or units/scales of features, which is often the case in many real-world applications.(c) 2022 Elsevier Ltd. All rights reserved.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据