☆ 4.6 Article

Towards understanding hierarchical clustering: A data distribution perspective

NEUROCOMPUTING (2009)

期刊

NEUROCOMPUTING

卷 72, 期 10-12, 页码 2319-2330

出版社

ELSEVIER

DOI: 10.1016/j.neucom.2008.12.011

关键词

Hierarchical clustering; F-measure; Measure normalization; Unweighted pair group method with arithmetic mean (UPGMA); Coefficient of variation (CV)

类别

Computer Science, Artificial Intelligence

资金

National Natural Science Foundation of China (NSFC) [70621061, 70890082]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

A very important category of clustering methods is hierarchical clustering. There are considerable research efforts which have been focused on algorithm-level improvements of the hierarchical clustering process. In this paper, our goal is to provide a systematic understanding of hierarchical clustering from a data distribution perspective. Specifically, we investigate the issues about how the true cluster distribution can make impact on the clustering performance, and what is the relationship between hierarchical clustering schemes and validation measures with respect to different data distributions. To this end, we provide an organized study to illustrate these issues. Indeed, one of our key findings reveals that hierarchical clustering tends to produce clusters with high variation on cluster sizes regardless of true cluster distributions. Also, our results show that F-measure, an external clustering validation measure, has bias towards hierarchical clustering algorithms which tend to increase the variation on cluster sizes. Viewed in light of this, we propose F-norm. the normalized version of the F-measure, to solve the cluster validation problem for hierarchical clustering. Experimental results show that F-norm is indeed more suitable than the unnormalized F-measure in evaluating the hierarchical clustering results across data sets with different data distributions. (c) 2009 Elsevier B.V. All rights reserved.

Towards understanding hierarchical clustering: A data distribution perspective

期刊

NEUROCOMPUTING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Towards understanding hierarchical clustering: A data distribution perspective

期刊

NEUROCOMPUTING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文