☆ 3.8 Proceedings Paper

Fast k-means based on k-NN Graph

2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE) (2018)

期刊

2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE)

卷 -, 期 -, 页码 1220-1223

出版社

IEEE

DOI: 10.1109/ICDE.2018.00115

关键词

类别

Computer Science, Information Systems Computer Science, Theory & Methods

资金

National Natural Science Foundation of China [61572408]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In the big data era, k-means clustering has been widely adopted as a basic processing tool in various contexts. However, its computational cost could be prohibitively high when the data size and the cluster number are large. The processing bottleneck of k-means lies in the operation of seeking the closest centroid in each iteration. In this paper, a novel solution towards the scalability issue of k-means is presented. In the proposal, k-means is supported by an approximate k-nearest neighbors graph. In the k-means iteration, each data sample is only compared to clusters that its nearest neighbors reside. Since the number of nearest neighbors we consider is much less than k, the processing cost in this step becomes minor and irrelevant to k. The processing bottleneck is therefore broken. The most interesting thing is that k-nearest neighbor graph is constructed by calling the fast k-means itself. Compared with existing fast k-means variants, the proposed algorithm achieves hundreds to thousands times speed-up while maintaining high clustering quality. As it is tested on 10 million 512-dimensional data, it takes only 5.2 hours to produce 1 million clusters. In contrast, it would take 3 years for traditional k-means to fulfill the same scale of clustering.

Fast k-means based on k-NN Graph

期刊

2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE)

出版社

IEEE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Fast k-means based on k-NN Graph

期刊

2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE)

出版社

IEEE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文