Journal
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
Volume 31, Issue 5, Pages 1036-1047Publisher
IEEE COMPUTER SOC
DOI: 10.1109/TPDS.2019.2961909
Keywords
CUDA; multi-GPU; MPI; dynamic load balancing; hilbert space filling curves; multi-resolution grid; shallow water equations (SWE); AMR
Funding
- CSCS (Switzerland)
- CINECA (Italy)
- STFC (U.K.)
- Italian MIUR, under the Scientific Independence of young researchers program [RBSI14R1GP, D92I15000190001]
- Italian INdAM-GNCS Project 2019
Ask authors/readers for more resources
This article presents a multi-GPU implementation of a Finite-Volume solver on a multi-resolution grid. The implementation completely offloads the computation to the GPUs and communications between different GPUs are implemented by means of the Message Passing Interface (MPI) API. Different domain decomposition techniques have been considered and the one based on the Hilbert Space Filling Curves (HSFC) showed optimal scalability. Several optimizations are introduced: One-to-one MPI communications among MPI ranks are completely masked by GPU computations on internal cells and a novel dynamic load balancing algorithm is introduced to minimize the waiting times at global MPI synchronization barriers. Such algorithm adapts the computational load of ranks in response to dynamical changes in the execution time of blocks and in network performances; Its capability to converge to a balanced computation has been empirically shown by numerical experiments. Tests exploit up to 64 GPUs and 83M cells and achieve an efficiency of 90 percent in weak scalability and 85 percent for strong scalability. The framework is general and the results of the article can be ported to a wide range of explicit 2D Partial Differential Equations solvers.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available