☆ 4.6 Article

Density Peaks Clustering Based on Jaccard Similarity and Label Propagation

COGNITIVE COMPUTATION (2021)

期刊

COGNITIVE COMPUTATION

卷 13, 期 6, 页码 1609-1626

出版社

SPRINGER

DOI: 10.1007/s12559-021-09906-w

关键词

Density peaks clustering; Similarity measurement; Jaccard coefficient; Label propagation

类别

Computer Science, Artificial Intelligence Neurosciences

资金

National Natural Science Foundation of China [21606159, 62176176]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Cognitive computing involves discovering hidden rules and patterns in massive volumes of data. Density peaks clustering (DPC) is a powerful data mining tool that can efficiently detect clusters of arbitrary shapes. By utilizing the Jaccard coefficient to measure point similarity and implementing a two-step allocation strategy, accurate identification of density peaks can be achieved.

Cognitive computing involves discovering hidden rules and patterns in massive volumes of data. Density peaks clustering (DPC) is a powerful data mining tool that can identify density peaks in decision graphs and assign labels to them without requiring iterations. It can efficiently and simply detect clusters of arbitrary shapes. However, on the one hand, density measurement using the epsilon neighbor or Gaussian kernel only reflects the global structure of the data, so that correct density peaks cannot be found, and performance on manifold datasets is weakened. On the other hand, the one-step allocation strategy results in chain reaction. Once a point with high density is misallocated, a series of points will be incorrectly assigned. To solve this problem, this paper proposes the Jaccard coefficient to measure the similarity between points. The proposed density measurement based on Jaccard coefficient is only related to the k points that share the max similarity with the given point, which can reflect the local structure of manifold datasets, and the density peaks can be identified accurately. Aiming at the chain reaction caused by the assignment strategy of DPC, we develop a two-step allocation strategy based on label propagation and the proposed measurement of similarity. The first step is to assign labels to points close to the clustering centers, where these are equal to labeled points in the label propagation algorithm. The second step is to complete the assignment of labels to the remaining points according to labeled data which is the nearest to each unassigned sample. We compared the proposed algorithm with four algorithms on synthetic datasets and real-world datasets. The three metrics among these algorithms show that the proposed algorithm outperforms other algorithms. The results of clustering on synthetic datasets verified the effectiveness of the proposed method for manifold datasets, and three metrics on the UCI datasets and the Olivetti Faces dataset show that it can reveal the patterns and associations of real-world datasets.

Density Peaks Clustering Based on Jaccard Similarity and Label Propagation

期刊

COGNITIVE COMPUTATION

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Density Peaks Clustering Based on Jaccard Similarity and Label Propagation

期刊

COGNITIVE COMPUTATION

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文