4.7 Article

Accelerating the Lagrangian simulation of water ages on distributed, multi-GPU platforms: The importance of dynamic load balancing

Journal

COMPUTERS & GEOSCIENCES
Volume 166, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.cageo.2022.105189

Keywords

Water age; Particle tracking; Multi-GPU with MPI; Domain decomposition; Load balancing

Funding

  1. U.S. Department of Energy Office of Science, Offices of Advanced Scientific Computing Research and Biological and Environmental Sciences
  2. Watershed Function Scientific Focus Area [DE-AC02- 05CH11231]
  3. National Science Foundation

Ask authors/readers for more resources

This paper presents a parallel approach for particle tracking simulations using multiple GPUs and MPI parallelism. Load balancing schemes are proposed to dynamically balance the distribution of particles, improving parallel scaling. The research demonstrates the practical importance of load balancing in achieving large-scale simulations.
Water age is a fundamental descriptor of the source, storage, and mixing of water in watersheds. The Lagrangian, particle tracking, approach is a powerful tool for physically-based modeling of water age distributions, but its application has been hampered because it is computationally demanding. Here, we present a parallel approach for particle tracking simulations. This approach uses a multi-GPU with MPI parallelism based on domain decomposition. An inherent challenge of the distributed parallelization of Lagrangian approaches is the disparity in computational work or load imbalance (LIB) among different processing elements (PEs). In this study, load balancing (LB) schemes were proposed to dynamically balance the distribution of particles across PEs during runtime. In the followed hillslope simulations, LIB was observed in all LB-disabled runs, e.g., with a load ratio of 4.3 by using 2-GPU in the test case. LB schemes then accurately balanced the load distribution and improved the parallel scaling. Additionally, the parallel approach showed an excellent overall speedup: a 25-fold improvement using 4-GPU relative to 128 OpenMP threads. A regional-scale application further demonstrated the LB performance. The wall-clock time used by 8-GPU without LB was reduced by 31.33% after the LB was activated. Increasing 8-GPU with LB to 16-GPU with LB showed parallel scalability by reducing the wall-clock time by similar to 50%. This work shows how massively parallel computing can be applied to particle tracking in water age simulations. It also demonstrates the practical importance of load balancing in this context, which enables large-scale simulations with the increased complexity of flow paths.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available