4.7 Article

Pushing memory bandwidth limitations through efficient implementations of Block-Krylov space solvers on GPUs

Journal

COMPUTER PHYSICS COMMUNICATIONS
Volume 233, Issue -, Pages 29-40

Publisher

ELSEVIER SCIENCE BV
DOI: 10.1016/j.cpc.2018.06.019

Keywords

Block solver; GPU

Funding

  1. U.S. Department of Energy, Office of Science, Office of High Energy Physics [DE-AC02-07CH11359]
  2. U.S. National Science Foundation [PHY14-14614]
  3. Exascale Computing Project [17-SC-20-SC]
  4. U.S. Department of Energy Office of Science
  5. National Nuclear Security Administration
  6. ORNL

Ask authors/readers for more resources

The cost of the iterative solution of a sparse matrix-vector system against multiple vectors is a common challenge within scientific computing. A tremendous number of algorithmic advances, such as eigenvector deflation and domain-specific multi-grid algorithms, have been ubiquitously beneficial in reducing this cost. However, they do not address the intrinsic memory-bandwidth constraints of the matrix-vector operation dominating iterative solvers. Batching this operation for multiple vectors and exploiting cache and register blocking can yield a super-linear speed up. Block-Krylov solvers can naturally take advantage of such batched matrix-vector operations, further reducing the iterations to solution by sharing the Krylov space between solves. Practical implementations typically suffer from the quadratic scaling in the number of vector-vector operations. We present an implementation of the block Conjugate Gradient algorithm on NVIDIA GPUs which reduces the memory-bandwidth complexity of vector-vector operations from quadratic to linear. As a representative case, we consider the domain of lattice quantum chromodynamics and present results for one of the fermion discretizations. Using the QUDA library as a framework, we demonstrate a 5 x speedup compared to highly-optimized independent Krylov solves on NVIDIA's SaturnV cluster. (C) 2018 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available