4.7 Article

Advanced parallel implementation of the coupled ocean-ice model FEMAO (version 2.0) with load balancing

Journal

GEOSCIENTIFIC MODEL DEVELOPMENT
Volume 14, Issue 2, Pages 843-857

Publisher

COPERNICUS GESELLSCHAFT MBH
DOI: 10.5194/gmd-14-843-2021

Keywords

-

Funding

  1. Russian Foundation for Basic Research [18-05-60184, 19-35-90023]
  2. Ministry of Education and Science of the Russian Federation [075-15-2019-1624]

Ask authors/readers for more resources

This paper presents a parallel version of the Arctic Ocean finite-element model configured for the White Sea, based on MPI technology. The model consists of ocean dynamics and surface ice dynamics parts, accommodating different computations due to varying complexities. By locating submodels on the same CPU cores with a common horizontal partition and using Hilbert-curve balancing, the authors achieve parallel acceleration and load balance improvements.
In this paper, we present a parallel version of the finite-element model of the Arctic Ocean (FEMAO) configured for the White Sea and based on MPI technology. This model consists of two main parts: an ocean dynamics model and a surface ice dynamics model. These parts are very different in terms of the number of computations because the complexity of the ocean part depends on the bottom depth, while that of the sea-ice component does not. In the first step, we decided to locate both submodels on the same CPU cores with a common horizontal partition of the computational domain. The model domain is divided into small blocks, which are distributed over the CPU cores using Hilbert-curve balancing. Partitioning of the model domain is static (i.e., computed during the initialization stage). There are three baseline options: a single block per core, balancing of 2D computations, and balancing of 3D computations. After showing parallel acceleration for particular ocean and ice procedures, we construct the common partition, which minimizes joint imbalance in both submodels. Our novelty is using arrays shared by all blocks that belong to a CPU core instead of allocating separate arrays for each block, as is usually done. Computations on a CPU core are restricted by the masks of non-land grid nodes and block-core correspondence. This approach allows us to implement parallel computations into the model that are as simple as when the usual decomposition to squares is used, though with advances in load balancing. We provide parallel acceleration of up to 996 cores for the model with a resolution of 500 x 500 x 39 in the ocean component and 43 sea-ice scalars, and we carry out a detailed analysis of different partitions on the model runtime.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available