☆ 4.7 Article

High-order accurate simulation of incompressible turbulent flows on many parallel GPUs of a hybrid-node supercomputer

COMPUTER PHYSICS COMMUNICATIONS (2019)

期刊

COMPUTER PHYSICS COMMUNICATIONS

卷 244, 期 -, 页码 132-142

出版社

ELSEVIER

DOI: 10.1016/j.cpc.2019.06.012

关键词

GPU computing; CUDA; High-order accurate methods; Transitional and turbulent flows; Hybrid supercomputing

类别

Computer Science, Interdisciplinary Applications Physics, Mathematical

资金

Platform for Advanced Scientific Computing (PASC), Switzerland

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Turbulent incompressible flows play an important role in a broad range of natural and industrial processes. High-order direct numerical simulations are often used for resolving the spatio-temporal scales of such flows. Such high-fidelity simulations require an extensive computational layout which often results in prohibitive computational costs. Recent advances in modern computing platforms, such as GPU-powered hybrid-node supercomputers, appear to become an enabler for high-fidelity CFD at large scales. In this work, we propose methods for accelerating a distributed-memory high-order incompressible Navier-Stokes solver by using NVIDIA Pascal CPUs of a Cray XC40/50 supercomputer. Arithmetically intensive or chronically invoked routines were ported to the GPUs using CUDA C. Host side driver routines were developed to invoke CUDA C external kernels from the FORTRAN legacy code. Numerical methods, for some of the most intensive operations, namely multigrid preconditioners, were modified to be suited to the SIMD standard for graphics processors. Customized unit testing was performed to ensure double-precision accuracy of GPU computations. The optimization layer maintained the memory structure of the legacy code. Post-profiling confirms that backbone distributed memory communications increase the number of dynamic CPU-GPU memory copies, which offsets a part of the computational performance. Strong scalability of the entire flow solver and of the stand-alone pressure solver has been examined on up to 512 P100 CPUs. Strong scaling efficiency decreased for higher numbers of GPUs, probably due to a less favorable communication-to-computation ratio. Weak scalability of the entire solver was tested on up to 4096 P100 GPUs for two problems of different sizes. The solver maintained nearly ideal weak scalability for the larger problem, illustrating the potential of GPUs in dealing with highly resolved flows. The CPU-enabled solver is finally deployed for the scale-resolving simulation of flow transition in the wake of a solid sphere at Re=3700, by utilizing 192 CPUs. The time-averaged pressure coefficient along the sphere surface was in good agreement with previously reported data acquired from CPU-based direct numerical simulations and experiments. (C) 2019 Elsevier B.V. All rights reserved.

High-order accurate simulation of incompressible turbulent flows on many parallel GPUs of a hybrid-node supercomputer

期刊

COMPUTER PHYSICS COMMUNICATIONS

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

High-order accurate simulation of incompressible turbulent flows on many parallel GPUs of a hybrid-node supercomputer

期刊

COMPUTER PHYSICS COMMUNICATIONS

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文