4.3 Article

Heterogeneous sparse matrix-vector multiplication via compressed sparse row format

期刊

PARALLEL COMPUTING
卷 115, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.parco.2023.102997

关键词

Sparse linear algebra; Sparse matrix vector multiplication; GPU; Heterogenous; OpenMP; CUDA

向作者/读者索取更多资源

This paper introduces a heterogeneous format called CSR-k based on CSR, which achieves high-performance SpMV execution on different devices by reordering and grouping rows into hierarchical structures. It outperforms Intel MKL, NVIDIA cuSPARSE, and Sandia National Laboratories' KokkosKernels for regular sparse matrices.
Sparse matrix-vector multiplication (SpMV) is one of the most important kernels in high-performance computing (HPC), yet SpMV normally suffers from ill performance on many devices. Due to ill performance, SpMV normally requires special care to store and tune for a given device. Moreover, HPC is facing heterogeneous hardware containing multiple different compute units, e.g., many-core CPUs and GPUs. Therefore, an emerging goal has been to produce heterogeneous formats and methods that allow critical kernels, e.g., SpMV, to be executed on different devices with portable performance and minimal changes to format and method. This paper presents a heterogeneous format based on CSR, named CSR-k, that can be tuned quickly and outperforms the average performance of Intel MKL on Intel Xeon Platinum 838 and AMD Epyc 7742 CPUs while still outperforming NVIDIA's cuSPARSE and Sandia National Laboratories' KokkosKernels on NVIDIA A100 and V100 for regular sparse matrices, i.e., sparse matrices where the number of nonzeros per row has a variance <= 10, such as those commonly generated from two and three-dimensional finite difference and element problems. In particular, CSR-k achieves this with reordering and by grouping rows into a hierarchical structure of super-rows and super-super-rows that are represented by just a few extra arrays of pointers. Due to its simplicity, a model can be tuned for a device, and this model can be used to select super-row and super-super-rows sizes in constant time.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据