4.7 Article

Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors

期刊

INFORMATION SCIENCES
卷 354, 期 -, 页码 19-40

出版社

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2016.03.011

关键词

Local density; Density peaks; Clustering; K-nearest neighbors; Fuzzy weighted K-nearest neighbors

资金

  1. National Natural Science Foundation of China [31372250]
  2. Key Science and Technology Program of Shaanxi Province of China [2013K12-03-24]
  3. Fundamental Research Funds for the Central Universities [GK201503067]
  4. International Science& Technology Cooperation Program of China [2011DFA12910]

向作者/读者索取更多资源

Clustering by fast search and find of Density Peaks (referred to as DPC) was introduced by Alex Rodriguez and Alessandro Laio. The DPC algorithm is based on the idea that cluster centers are characterized by having a higher density than their neighbors and by being at a relatively large distance from points with higher densities. The power of DPC was demonstrated on several test cases. It can intuitively find the number of clusters and can detect and exclude the outliers automatically, while recognizing the clusters regardless of their shape and the dimensions of the space containing them. However, DPC does have some drawbacks to be addressed before it may be widely applied. First, the local density rho(i) of point i is affected by the cutoff distance d(c), and is computed in different ways depending on the size of datasets, which can influence the clustering, especially for small real-world cases. Second, the assignment strategy for the remaining points, after the density peaks (that is the cluster centers) have been found, can create a Domino Effect, whereby once one point is assigned erroneously, then there may be many more points subsequently mis-assigned. This is especially the case in real-word datasets where there could exist several clusters of arbitrary shape overlapping each other. To overcome these deficiencies, a robust clustering algorithm is proposed in this paper. To find the density peaks, this algorithm computes the local density rho(i) of point i relative to its K-nearest neighbors for any size dataset independent of the cutoff distance d(c), and assigns the remaining points to the most probable clusters using two new point assignment strategies. The first strategy assigns non-outliers by undertaking a breadth first search of the K-nearest neighbors of a point starting from cluster centers. The second strategy assigns outliers and the points unassigned by the first assignment procedure using the technique of fuzzy weighted K-nearest neighbors. The proposed clustering algorithm is benchmarked on publicly available synthetic and real-world datasets which are commonly used for testing the performance of clustering algorithms. The clustering results of the proposed algorithm are compared not only with that of DPC but also with that of several well known clustering algorithms including Affinity Propagation (AP), Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and K-means. The benchmarks used are: clustering accuracy (Acc), Adjusted Mutual Information (AMI) and Adjusted Rand Index (ARI). The experimental results demonstrate that our proposed clustering algorithm can find cluster centers, recognize clusters regardless of their shape and dimension of the space in which they are embedded, be unaffected by outliers, and can often outperform DPC, AP, DBSCAN and K-means. (C) 2016 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据