4.7 Article

GPU Acceleration of Large-Scale Full-Frequency GW Calculations

Journal

JOURNAL OF CHEMICAL THEORY AND COMPUTATION
Volume 18, Issue 8, Pages 4690-4707

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/acs.jctc.2c00241

Keywords

-

Funding

  1. Midwest Integrated Center for Computational Materials (MICCoM) - U.S. Department of Energy, Office of Science, Basic Energy Sciences, Materials Sciences, and Engineering Division through Argonne National Laboratory (ANL)
  2. Office of Science of the U.S. Department of Energy [DE-AC05-00OR22725]
  3. U.S. Department of Energy Office of Science User Facility [DE-AC02-05CH11231, DE-AC02-06CH11357]
  4. NERSC Exascale Application Readiness Program

Ask authors/readers for more resources

The study focuses on GPU acceleration of the full-frequency GW method implemented in the WEST code, achieving excellent performance through optimized GPU libraries, parallelization strategy, MPI communications, and mixed precision. Benchmark tests on high-performance computing systems demonstrate significant speedup of the GPU-accelerated version of WEST compared to its CPU version, with good strong and weak scaling using up to 25,920 GPUs.
Many-body perturbation theory is a powerful method to simulate electronic excitations in molecules and materials starting from the output of density functional theory calculations. By implementing the theory efficiently so as to run at scale on the latest leadership high-performance computing systems it is possible to extend the scope of GW calculations. We present a GPU acceleration study of the full-frequency GW method as implemented in the WEST code. Excellent performance is achieved through the use of (i) optimized GPU libraries, e.g., cuFFT and cuBLAS, (ii) a hierarchical parallelization strategy that minimizes CPU-CPU, CPU-GPU, and GPU-GPU data transfer operations, (iii) nonblocking MPI communications that overlap with GPU computations, and (iv) mixed precision in selected portions of the code. A series of performance benchmarks has been carried out on leadership high-performance computing systems, showing a substantial speedup of the GPU-accelerated version of WEST with respect to its CPU version. Good strong and weak scaling is demonstrated using up to 25 920 GPUs. Finally, we showcase the capability of the GPU version of WEST for large-scale, full-frequency GW calculations of realistic systems, e.g., a nanostructure, an interface, and a defect, comprising up to 10 368 valence electrons.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available