4.7 Article

Overcoming weaknesses of density peak clustering using a data-dependent similarity measure

Journal

PATTERN RECOGNITION
Volume 137, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2022.109287

Keywords

Clustering; Density peak clustering; Similarity measure; Data-dependent similarity

Ask authors/readers for more resources

Density Peak Clustering (DPC) is a popular clustering algorithm that uses pairwise similarity to detect arbitrary shaped clusters. However, it is not robust for datasets with different densities and is sensitive to scale changes in data representation. This paper proposes an effective data-dependent similarity measure called MP-Similarity, and integrates it into DPC to create MP-DPC. The experiments show that MP-DPC outperforms DPC with Euclidean distance and existing similarity measures, and is robust to changes in data scales.
Density Peak Clustering (DPC) is a popular state-of-the-art clustering algorithm, which requires pairwise (dis)similarity of data objects to detect arbitrary shaped clusters. While it is shown to perform well for many applications, DPC remains: (i) not robust for datasets with clusters having different densities, and (ii) sensitive to the change in the units/scales used to represent data. These drawbacks are mainly due to the use of the data-independent similarity measure based on the Euclidean distance. In this paper, we ad-dress these issues by proposing an effective data-dependent similarity measure based on Probability Mass , which we call MP-Similarity , and by incorporating it in DPC to create MP-DPC, a data-dependent variant of DPC. We evaluate and compare MP-DPC against diverse baselines using several clustering metrics and datasets. Our experiments demonstrate that: (a) MP-DPC produces better clustering results than DPC us-ing the Euclidean distance and existing data-dependent similarity measures; (b) MP-Similarity coupled with Shared-Nearest-Neighbor-based density metric in DPC further enhances the quality of clustering results; and (c) unlike DPC with existing data-independent and data-dependent similarity measures, MP-DPC is robust to the change in the units/scales used to represent data. Our findings suggest that MP -Similarity provides a more viable solution for DPC in datasets with unknown distribution or units/scales of features, which is often the case in many real-world applications.(c) 2022 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available