☆ 4.4 Article

An efficient sparse stiffness matrix vector multiplication using compressed sparse row storage format on AMD GPU

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE (2022)

Journal

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE

Volume 34, Issue 23, Pages -

Publisher

WILEY

DOI: 10.1002/cpe.7186

Keywords

AMD GPU; HPC; performance acceleration; sparse stiffness matrix-vector multiplication

Funding

National Key R&D Program of China [2017YFB0202303]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The performance of sparse stiffness matrix-vector multiplication is crucial for large-scale structural mechanics numerical simulation. This article introduces a new CSR-vector row algorithm that achieves fine-grained computing optimization for sparse stiffness matrices on AMD GPUs, demonstrating efficient reduce operations and deep memory access optimization, resulting in improved computing performance.

The performance of sparse stiffness matrix-vector multiplication is essential for large-scale structural mechanics numerical simulation. Compressed sparse row (CSR) is the most common format for storing sparse stiffness matrices. However, the high sparsity of the sparse stiffness matrix makes the number of nonzero elements per row very small. Therefore, the CSR-scalar algorithm, light algorithm, and HOLA algorithm in the calculation will cause some threads in the GPU to be in idle state, which will not only affect the computing performance but also waste computing resources. In this article, a new algorithm, CSR-vector row, is proposed for fine-grained computing optimization based on the AMD GPU architecture on heterogeneous supercomputers. This algorithm can set a vector to calculate a row based on the number of nonzero elements of the stiffness matrix. CSR-vector row has efficient reduce operations, deep memory access optimization, better memory access, and calculation overlapping kernel function configuration scheme. The access bandwidth of the algorithm on AMD GPU is more than 700 GB/s. Compared with CSR-scalar algorithm, the parallel efficiency of CSR-vector row is improved by 7.2 times. And floating-point computing performance is 41%-95% higher than that of light algorithm and HOLA algorithm. In addition, CSR-vector row is used to calculate the examples from CFD, electromagnetics, quantum chemistry, power network, and semiconductor process, the memory access bandwidth and double floating-point performance are also improved compared with rocSPARSE-CSR-vector.

An efficient sparse stiffness matrix vector multiplication using compressed sparse row storage format on AMD GPU

Journal

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE

Publisher

WILEY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

An efficient sparse stiffness matrix vector multiplication using compressed sparse row storage format on AMD GPU

Journal

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE

Publisher

WILEY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper