Journal
COMPUTER PHYSICS COMMUNICATIONS
Volume 181, Issue 9, Pages 1517-1528Publisher
ELSEVIER
DOI: 10.1016/j.cpc.2010.05.002
Keywords
CUDA; GPGPU; CPU; Lattice QCD; Mixed precision
Funding
- US DOE [DE-FG02-91ER40676, DE-FC02-06ER41440]
- NSF [DGE-0221680, PHY-0427646, PHY-0835713, OCI-0749300]
- Science and Technology Facilities Council [ST/G00059X/1] Funding Source: researchfish
- STFC [ST/G00059X/1] Funding Source: UKRI
- Direct For Mathematical & Physical Scien
- Division Of Physics [0835713] Funding Source: National Science Foundation
Ask authors/readers for more resources
Modern graphics hardware is designed for highly parallel numerical tasks and promises significant cost and performance benefits for many scientific applications. One such application is lattice quantum chromodynamics (lattice QCD), where the main computational challenge is to efficiently solve the discretizecl Dirac equation in the presence of an SU(3) gauge field. Using NVIDIA's CUDA platform we have implemented a Wilson-Dirac sparse matrix-vector product that performs at up to 40, 135 and 212 Gflops for double, single and half precision respectively on NVIDIA's GeForce GTX 280 CPU. We have developed a new mixed precision approach for Krylov solvers using reliable updates which allows for full double precision accuracy while using only single or half precision arithmetic for the bulk of the computation. The resulting BiCGstab and CG solvers run in excess of 100 Gflops and, in terms of iterations until convergence, perform better than the usual defect-correction approach for mixed precision. (C) 2010 Elsevier B.V. All rights reserved.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available