☆ 4.7 Article

A scalable framework for cluster ensembles

PATTERN RECOGNITION (2009)

期刊

PATTERN RECOGNITION

卷 42, 期 5, 页码 676-688

出版社

ELSEVIER SCI LTD

DOI: 10.1016/j.patcog.2008.09.027

关键词

Clustering; Hard/fuzzy-k-means; Large data sets; Ensemble; Scalability; Single pass algorithm

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic

资金

National Institutes of Health [1 R01 EB00822-01]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

An ensemble of clustering solutions or partitions may be generated for a number of reasons. If the data set is very large, clustering may be done on tractable size disjoint subsets. The data may be distributed at different sites for which a distributed clustering solution with a final merging of partitions is a natural fit. In this paper, two new approaches to combining partitions, represented by sets of cluster centers, are introduced. The advantage of these approaches is that they provide a final partition of data that is comparable to the best existing approaches, yet scale to extremely large data sets. They can be 100,000 times faster while using much less memory. The new algorithms are compared against the best existing cluster ensemble merging approaches, clustering all the data at once and a clustering algorithm designed for very large data sets. The comparison is done for fuzzy and hard-k-means based clustering algorithms. It is shown that the centroid-based ensemble merging algorithms presented here generate partitions of quality comparable to the best label vector approach or clustering all the data at once, while providing very large speedups. (C) 2008 Elsevier Ltd. All rights reserved.

A scalable framework for cluster ensembles

期刊

PATTERN RECOGNITION

出版社

ELSEVIER SCI LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A scalable framework for cluster ensembles

期刊

PATTERN RECOGNITION

出版社

ELSEVIER SCI LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文