☆ 4.7 Article

Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors

INFORMATION SCIENCES (2016)

期刊

INFORMATION SCIENCES

卷 354, 期 -, 页码 19-40

出版社

ELSEVIER SCIENCE INC

DOI: 10.1016/j.ins.2016.03.011

关键词

Local density; Density peaks; Clustering; K-nearest neighbors; Fuzzy weighted K-nearest neighbors

类别

Computer Science, Information Systems

资金

National Natural Science Foundation of China [31372250]
Key Science and Technology Program of Shaanxi Province of China [2013K12-03-24]
Fundamental Research Funds for the Central Universities [GK201503067]
International Science& Technology Cooperation Program of China [2011DFA12910]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Clustering by fast search and find of Density Peaks (referred to as DPC) was introduced by Alex Rodriguez and Alessandro Laio. The DPC algorithm is based on the idea that cluster centers are characterized by having a higher density than their neighbors and by being at a relatively large distance from points with higher densities. The power of DPC was demonstrated on several test cases. It can intuitively find the number of clusters and can detect and exclude the outliers automatically, while recognizing the clusters regardless of their shape and the dimensions of the space containing them. However, DPC does have some drawbacks to be addressed before it may be widely applied. First, the local density rho(i) of point i is affected by the cutoff distance d(c), and is computed in different ways depending on the size of datasets, which can influence the clustering, especially for small real-world cases. Second, the assignment strategy for the remaining points, after the density peaks (that is the cluster centers) have been found, can create a Domino Effect, whereby once one point is assigned erroneously, then there may be many more points subsequently mis-assigned. This is especially the case in real-word datasets where there could exist several clusters of arbitrary shape overlapping each other. To overcome these deficiencies, a robust clustering algorithm is proposed in this paper. To find the density peaks, this algorithm computes the local density rho(i) of point i relative to its K-nearest neighbors for any size dataset independent of the cutoff distance d(c), and assigns the remaining points to the most probable clusters using two new point assignment strategies. The first strategy assigns non-outliers by undertaking a breadth first search of the K-nearest neighbors of a point starting from cluster centers. The second strategy assigns outliers and the points unassigned by the first assignment procedure using the technique of fuzzy weighted K-nearest neighbors. The proposed clustering algorithm is benchmarked on publicly available synthetic and real-world datasets which are commonly used for testing the performance of clustering algorithms. The clustering results of the proposed algorithm are compared not only with that of DPC but also with that of several well known clustering algorithms including Affinity Propagation (AP), Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and K-means. The benchmarks used are: clustering accuracy (Acc), Adjusted Mutual Information (AMI) and Adjusted Rand Index (ARI). The experimental results demonstrate that our proposed clustering algorithm can find cluster centers, recognize clusters regardless of their shape and dimension of the space in which they are embedded, be unaffected by outliers, and can often outperform DPC, AP, DBSCAN and K-means. (C) 2016 Elsevier Inc. All rights reserved.

Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors

期刊

INFORMATION SCIENCES

出版社

ELSEVIER SCIENCE INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors

期刊

INFORMATION SCIENCES

出版社

ELSEVIER SCIENCE INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文