4.7 Article

High-order accurate simulation of incompressible turbulent flows on many parallel GPUs of a hybrid-node supercomputer

期刊

COMPUTER PHYSICS COMMUNICATIONS
卷 244, 期 -, 页码 132-142

出版社

ELSEVIER
DOI: 10.1016/j.cpc.2019.06.012

关键词

GPU computing; CUDA; High-order accurate methods; Transitional and turbulent flows; Hybrid supercomputing

资金

  1. Platform for Advanced Scientific Computing (PASC), Switzerland

向作者/读者索取更多资源

Turbulent incompressible flows play an important role in a broad range of natural and industrial processes. High-order direct numerical simulations are often used for resolving the spatio-temporal scales of such flows. Such high-fidelity simulations require an extensive computational layout which often results in prohibitive computational costs. Recent advances in modern computing platforms, such as GPU-powered hybrid-node supercomputers, appear to become an enabler for high-fidelity CFD at large scales. In this work, we propose methods for accelerating a distributed-memory high-order incompressible Navier-Stokes solver by using NVIDIA Pascal CPUs of a Cray XC40/50 supercomputer. Arithmetically intensive or chronically invoked routines were ported to the GPUs using CUDA C. Host side driver routines were developed to invoke CUDA C external kernels from the FORTRAN legacy code. Numerical methods, for some of the most intensive operations, namely multigrid preconditioners, were modified to be suited to the SIMD standard for graphics processors. Customized unit testing was performed to ensure double-precision accuracy of GPU computations. The optimization layer maintained the memory structure of the legacy code. Post-profiling confirms that backbone distributed memory communications increase the number of dynamic CPU-GPU memory copies, which offsets a part of the computational performance. Strong scalability of the entire flow solver and of the stand-alone pressure solver has been examined on up to 512 P100 CPUs. Strong scaling efficiency decreased for higher numbers of GPUs, probably due to a less favorable communication-to-computation ratio. Weak scalability of the entire solver was tested on up to 4096 P100 GPUs for two problems of different sizes. The solver maintained nearly ideal weak scalability for the larger problem, illustrating the potential of GPUs in dealing with highly resolved flows. The CPU-enabled solver is finally deployed for the scale-resolving simulation of flow transition in the wake of a solid sphere at Re=3700, by utilizing 192 CPUs. The time-averaged pressure coefficient along the sphere surface was in good agreement with previously reported data acquired from CPU-based direct numerical simulations and experiments. (C) 2019 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据