4.7 Article

Clustering with a new distance measure based on a dual-rooted tree

期刊

INFORMATION SCIENCES
卷 251, 期 -, 页码 96-113

出版社

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2013.05.040

关键词

Non-metric clustering; Minimal spanning tree; Prim's algorithm; Affinity measure; Co-association measure; Consensus clustering

资金

  1. PPF-ISSO Nice Sophia Antipolis University, France
  2. US National Science Foundation [CCR-0325571, CCF-0830490]
  3. Division of Computing and Communication Foundations
  4. Direct For Computer & Info Scie & Enginr [1217880] Funding Source: National Science Foundation

向作者/读者索取更多资源

This paper introduces a novel distance measure for clustering high dimensional data based on the hitting time of two Minimal Spanning Trees (MST) grown sequentially from a pair of points by Prim's algorithm. When the proposed measure is used in conjunction with spectral clustering, we obtain a powerful clustering algorithm that is able to separate neighboring non-convex shaped clusters and to account for local as well as global geometric features of the data set. Remarkably, the new distance measure is a true metric even if the Prim algorithm uses a non-metric dissimilarity measure to compute the edges of the MST. This metric property brings added flexibility to the proposed method. In particular, the method is applied to clustering non Euclidean quantities, such as probability distributions or spectra, using the Kullback-Leibler divergence as a base measure. We reduce computational complexity by applying consensus clustering to a small ensemble of dual rooted MSTs. We show that the resultant consensus spectral clustering with dual rooted MST is competitive with other clustering methods, both in terms of clustering performance and computational complexity. We illustrate the proposed clustering algorithm on public domain benchmark data for which the ground truth is known, on one hand, and on real-world astrophysical data on the other hand. (c) 2013 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据