4.4 Article

Fast hierarchical clustering and its validation

期刊

DATA & KNOWLEDGE ENGINEERING
卷 44, 期 1, 页码 109-138

出版社

ELSEVIER SCIENCE BV
DOI: 10.1016/S0169-023X(02)00138-6

关键词

clustering; validation; large and high-dimensional datasets; Voronoi diagram

向作者/读者索取更多资源

Clustering is the task of grouping similar objects into clusters. A prominent and useful class of algorithm is hierarchical agglomerative clustering (HAC) which iteratively agglomerates the closest pair until all data points belong to one cluster. It outputs a dendrograrn showing all N levels of agglomerations where N is the number of objects in the dataset. However, HAC methods have several drawbacks: (1) high time and memory complexities for clustering, and (2) inefficient and inaccurate cluster validation. In this paper we show that these drawbacks can be alleviated by closely studying the dendrogram. Empirical study shows that most HAC algorithms follow a trend where, except for a number of top levels of the dendrogram, all lower levels agglomerate clusters which are very small in size and close in proximity to other clusters. Methods are proposed that exploit this characteristic to reduce the time and memory complexities significantly and to make validation very efficient and accurate. Analyses and experiments show the effectiveness of the proposed method. (C) 2002 Elsevier Science B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据