4.7 Article

Heterogeneous CPU plus GPU parallelization for high-accuracy scale-resolving simulations of compressible turbulent flows on hybrid supercomputers

期刊

COMPUTER PHYSICS COMMUNICATIONS
卷 271, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.cpc.2021.108231

关键词

Scale-resolving simulation; Unstructured mesh; Heterogeneous computing; CPU plus GPU; MPI plus OpenMP plus OpenCL; Hybrid supercomputer

资金

  1. Russian Science Foundation [19-11-00299]
  2. Russian Science Foundation [19-11-00299] Funding Source: Russian Science Foundation

向作者/读者索取更多资源

This paper presents a heterogeneous parallel algorithm and its software implementation for simulating compressible turbulent flows. The algorithm is based on a family of higher accuracy edge-based reconstruction schemes on unstructured mixed-element meshes. The parallel solution can utilize a large number of computing devices on various computing architectures, including manycore CPUs and GPUs. The paper provides a detailed description of the parallel algorithm and its efficient implementation, as well as demonstrations of parallel performance on different supercomputers.
A heterogeneous parallel algorithm for simulation of compressible turbulent flows and its portable software implementation are presented. The underlying numerical method is based on a family of higher accuracy edge-based reconstruction schemes on unstructured mixed-element meshes. The proposed parallel solution can engage a large number of computing devices of most of the existing computing architectures used in modern supercomputers, including manycore CPUs and GPUs. It is capable of co-execution on both CPUs and accelerators simultaneously. The multilevel parallel algorithm combines: MPI for distributing workload among hybrid cluster nodes and between devices inside nodes; OpenMP for manycore CPUs and other supporting devices, such as Intel Xeon Phi; OpenCL for massively-parallel accelerators, such as GPUs of various vendors, including NVIDIA, AMD, Intel. The main focus is on the adaptation of the numerical method and its computational algorithm to the stream processing parallel paradigm. The very limited device memory inherent in GPU computing is also taken into account. A detailed description of the parallel algorithm is presented, as well as the techniques used for its efficient parallel implementation. Special attention is paid to implicit time integration with its linear solver and calculation of convective fluxes and viscous terms. The use of mixed floating-point precision and overlapping communications and computations is also discussed. Parallel performance is demonstrated in practical applications on different kinds of supercomputers using up to 10 thousand cores and multiple GPUs of comparable overall performance. (C) 2021 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据