4.7 Article

swMPAS-A: Scaling MPAS-A to 39 Million Heterogeneous Cores on the New Generation Sunway Supercomputer

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TPDS.2022.3215002

关键词

MPAS-A; atmosphere science; heterogeneous core; sunway supercomputer; I/O

向作者/读者索取更多资源

With the advancement of high-performance computing systems, the efficiency of scientific applications, particularly in the field of atmospheric modeling, is hindered by the suboptimal design of the input/output (I/O) module. To address this bottleneck, researchers propose a custom data reorganization method and various computation acceleration techniques. Experimental results show significant improvements in scalability and efficiency.
With the computing power of High-Performance Computing (HPC) systems having stepped into the exascale era, more complex problems can be solved with scientific applications on a large scale. However, due to the significant performance gap between computing nodes and storage subsystems, suboptimal design for the Input/Output (I/O) module will significantly impede the efficiency of scientific applications, especially for the ubiquitous atmosphere applications. Two-phase I/O implemented in N-to-1 mode creates a serious bottleneck that hinders the scalability for the Model for Prediction Across Scales-Atmosphere (MPAS-A) on the new generation Sunway supercomputer. To address the I/O problem, we apply a custom data reorganization method to enable N-to-M I/O mode to exploit the parallel file system's performance and limit the data transfer among MPI ranks to a restricted scope to alleviate communication overhead. Moreover, we have conducted several methods to accelerate the computations, including the redesign for tracer transport, a hybrid buffering scheme, and a three-level parallelization scheme, which allows MPAS-A to use all heterogeneous computing resources efficiently. Experimental results show admirable scalability and efficiency of our I/O method, which achieves speedups of 41x and 58.9x for input and output compared with the raw I/O method on 30,000 MPI ranks. By scaling MPAS-A to 39 million heterogeneous cores, we demonstrate the necessity of a well-constructed I/O module for a real-world atmosphere application. Speed tests show that our optimization methods obtain good results for computations, and MPAS-A achieves a speed of 0.82 Simulated Day per Hour (SDPH) and 0.76 parallel efficiency of strong scaling with 600,000 MPI ranks.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据