☆ 4.4 Article

Scalable K-Means++

PROCEEDINGS OF THE VLDB ENDOWMENT (2012)

期刊

PROCEEDINGS OF THE VLDB ENDOWMENT

卷 5, 期 7, 页码 622-633

出版社

ASSOC COMPUTING MACHINERY

DOI: 10.14778/2180912.2180915

关键词

类别

Computer Science, Information Systems Computer Science, Theory & Methods

资金

Direct For Computer & Info Scie & Enginr
Division of Computing and Communication Foundations [1016684] Funding Source: National Science Foundation
Direct For Computer & Info Scie & Enginr
Div Of Information & Intelligent Systems [0915040] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Over half a century old and showing no signs of aging, k-means remains one of the most popular data processing algorithms. As is well-known, a proper initialization of k-means is crucial for obtaining a good final solution. The recently proposed k-means + + initialization algorithm achieves this, obtaining an initial set of centers that is provably close to the optimum solution. A major downside of the k-means + + is its inherent sequential nature, which limits its applicability to massive data: one must make k passes over the data to find a good initial set of centers. In this work we show how to drastically reduce the number of passes needed to obtain, in parallel, a good initialization. This is unlike prevailing efforts on parallelizing k-means that have mostly focused on the post-initialization phases of k-means. We prove that our proposed initialization algorithm k-means|| obtains a nearly optimal solution after a logarithmic number of passes, and then show that in practice a constant number of passes suffices. Experimental evaluation on realworld large-scale data demonstrates that k-means|| outperforms k-means + + in both sequential and parallel settings.

Scalable K-Means++

期刊

PROCEEDINGS OF THE VLDB ENDOWMENT

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Scalable K-Means++

期刊

PROCEEDINGS OF THE VLDB ENDOWMENT

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文