4.5 Article

Large scale K-means clustering using GPUs

期刊

DATA MINING AND KNOWLEDGE DISCOVERY
卷 37, 期 1, 页码 67-109

出版社

SPRINGER
DOI: 10.1007/s10618-022-00869-6

关键词

k-means; GPGPU; Data mining; Clustering

向作者/读者索取更多资源

The paper presents a fast and memory-efficient GPU-based algorithm called ASB k-means for exact k-means clustering. The algorithm can handle large datasets that exceed the GPU memory size, and it outperforms the standard GPU-based k-means implementation in terms of speed and performance.
The k-means algorithm is widely used for clustering, compressing, and summarizing vector data. We present a fast and memory-efficient GPU-based algorithm for exact k-means, Asynchronous Selective Batched K-means (ASB K-means). Unlike most GPU-based k-means algorithms that require loading the whole dataset onto the GPU for clustering, the amount of GPU memory required to run our algorithm can be chosen to be much smaller than the size of the whole dataset. Thus, our algorithm can cluster datasets whose size exceeds the available GPU memory. The algorithm works in a batched fashion and applies the triangle inequality in each k-means iteration to omit a data point if its membership assignment, i.e., the cluster it belongs to, remains unchanged, thus significantly reducing the number of data points that need to be transferred between the CPU's RAM and the GPU's global memory and enabling the algorithm to very efficiently process large datasets. Our algorithm can be substantially faster than a GPU-based implementation of standard k-means even in situations when application of the standard algorithm is feasible because the whole dataset fits into GPU memory. Experiments show that ASB K-means can run up to 15x times faster than a standard GPU-based implementation of k-means, and it also outperforms the GPU-based k-means implementation in NVIDIA's open-source RAPIDS machine learning library on all the datasets used in our experiments.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据