4.6 Article

Density Peaks Clustering Based on Jaccard Similarity and Label Propagation

期刊

COGNITIVE COMPUTATION
卷 13, 期 6, 页码 1609-1626

出版社

SPRINGER
DOI: 10.1007/s12559-021-09906-w

关键词

Density peaks clustering; Similarity measurement; Jaccard coefficient; Label propagation

资金

  1. National Natural Science Foundation of China [21606159, 62176176]

向作者/读者索取更多资源

Cognitive computing involves discovering hidden rules and patterns in massive volumes of data. Density peaks clustering (DPC) is a powerful data mining tool that can efficiently detect clusters of arbitrary shapes. By utilizing the Jaccard coefficient to measure point similarity and implementing a two-step allocation strategy, accurate identification of density peaks can be achieved.
Cognitive computing involves discovering hidden rules and patterns in massive volumes of data. Density peaks clustering (DPC) is a powerful data mining tool that can identify density peaks in decision graphs and assign labels to them without requiring iterations. It can efficiently and simply detect clusters of arbitrary shapes. However, on the one hand, density measurement using the epsilon neighbor or Gaussian kernel only reflects the global structure of the data, so that correct density peaks cannot be found, and performance on manifold datasets is weakened. On the other hand, the one-step allocation strategy results in chain reaction. Once a point with high density is misallocated, a series of points will be incorrectly assigned. To solve this problem, this paper proposes the Jaccard coefficient to measure the similarity between points. The proposed density measurement based on Jaccard coefficient is only related to the k points that share the max similarity with the given point, which can reflect the local structure of manifold datasets, and the density peaks can be identified accurately. Aiming at the chain reaction caused by the assignment strategy of DPC, we develop a two-step allocation strategy based on label propagation and the proposed measurement of similarity. The first step is to assign labels to points close to the clustering centers, where these are equal to labeled points in the label propagation algorithm. The second step is to complete the assignment of labels to the remaining points according to labeled data which is the nearest to each unassigned sample. We compared the proposed algorithm with four algorithms on synthetic datasets and real-world datasets. The three metrics among these algorithms show that the proposed algorithm outperforms other algorithms. The results of clustering on synthetic datasets verified the effectiveness of the proposed method for manifold datasets, and three metrics on the UCI datasets and the Olivetti Faces dataset show that it can reveal the patterns and associations of real-world datasets.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据