4.5 Article

Hybrid CPU/GPU/APU accelerated query, insert, update and erase operations in hash tables with string keys

期刊

KNOWLEDGE AND INFORMATION SYSTEMS
卷 65, 期 10, 页码 4359-4377

出版社

SPRINGER LONDON LTD
DOI: 10.1007/s10115-023-01891-w

关键词

Hybrid; Hash table; Strings; GPU; APU; SYCL

向作者/读者索取更多资源

Modern computer systems can achieve significant performance improvements by using various types of hardware acceleration. Different accelerators require different optimized data structures and memory configurations to achieve the best performance. APUs, which combine CPU and integrated GPU, support shared memory and enable CPU and iGPU collaboration on pointer-based data structures.
Modern computer systems can use different types of hardware acceleration to achieve massive performance improvements. Some accelerators like FPGA and dedicated GPU (dGPU) need optimized data structures for the best performance and often use dedicated memory. In contrast, APUs, which are a combination of a CPU and an integrated GPU (iGPU), support shared memory and allow the iGPU to work together with the CPU on pointer-based data structures. First, we develop an approach for dGPU to accelerate queries in libcuckoo and robin-map and when looking at accelerating insert, updates and erase operations in the original libcuckoo using OneAPI on an APU. We evaluate the dGPU against the CPU variants and our dGPU approach adapted for the CPU and also in a hybrid context by using longer keys on the CPU and shorter keys on the dGPU. In comparison with the original libcuckoo algorithm, our dGPU approach achieves a speed-up of 2.1, and in comparison with the robin-map a speed-up of 1.5. For hybrid workloads, our approach is efficient if long keys are processed on the CPU and short keys are processed on the dGPU. By processing a mixture of 20% long keys on the CPU and 80% short keys on dGPU, our hybrid approach has a 40% higher throughput than the CPU only approach. In addition, we develop a hybrid APU approach for insert, update and erase operations in the original libcuckoo structure focusing on shared memory with iGPU accelerated look-ups of the positions for insert, update and erase operations.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据