☆ 4.5 Article

Accelerating fully resolved simulation of particle-laden flows on heterogeneous computer architectures

PARTICUOLOGY (2023)

期刊

PARTICUOLOGY

卷 81, 期 -, 页码 25-37

出版社

ELSEVIER SCIENCE INC

DOI: 10.1016/j.partic.2022.12.010

关键词

Lattice Boltzmann method; Immersed boundary method; Particle -laden flows; Heterogeneous acceleration; General Processing Units

类别

Engineering, Chemical Materials Science, Multidisciplinary

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

An efficient computing framework called PFlows was accelerated on NVIDIA GPUs and DCU cards for particle-laden flows simulations. The framework uses lattice Boltzmann method, immersed boundary method, and discrete element method for fluid flow, fluid-particle interaction, and particle collision, respectively. The framework achieved significant speed up on a single accelerator and demonstrated scalability with multiple accelerators for large-scale simulations. The results indicate the potential of the framework for future exascale computing.

An efficient computing framework, namely PFlows, for fully resolved-direct numerical simulations of particle-laden flows was accelerated on NVIDIA General Processing Units (GPUs) and GPU-like accelerator (DCU) cards. The framework is featured as coupling the lattice Boltzmann method for fluid flow with the immersed boundary method for fluid-particle interaction, and the discrete element method for particle collision, using two fixed Eulerian meshes and one moved Lagrangian point mesh, respectively. All the parts are accelerated by a fine-grained parallelism technique using CUDA on GPUs, and further using HIP on DCU cards, i.e., the calculation on each fluid grid, each immersed boundary point, each particle motion, and each pair-particle collision is responsible by one computer thread, respectively. Coalesced memory accesses to LBM distribution functions with the data layout of Structure of Arrays are used to maximize utilization of hardware bandwidth. Parallel reduction with shared memory for data of immersed boundary points is adopted for the sake of reducing access to global memory when integrate particle hydrodynamic force. MPI computing is further used for computing on heterogeneous architectures with multiple CPUs-GPUs/DCUs. The communications between adjacent processors are hidden by overlapping with calculations. Two benchmark cases were conducted for code validation, including a pure fluid flow and a particle-laden flow. The performances on a single accelerator show that a GPU V100 can achieve 7.1-11.1 times speed up, while a single DCU can achieve 5.6-8.8 times speed up compared to a single Xeon CPU chip (32 cores). The performances on multi-accelerators show that parallel efficiency is 0.5-0.8 for weak scaling and 0.68-0.9 for strong scaling on up to 64 DCU cards even for the dense flow (4 1/4 20%). The peak performance reaches 179 giga lattice updates per second (GLUPS) on 256 DCU cards by using 1 billion grids and 1 million particles. At last, a large-scale simulation of a gas-solid flow with 1.6 billion grids and 1.6 million particles was conducted using only 32 DCU cards. This simulation shows that the present framework is prospective for simulations of large-scale particle-laden flows in the upcoming exascale computing era. (c) 2022 Chinese Society of Particuology and Institute of Process Engineering, Chinese Academy of Sciences. Published by Elsevier B.V. All rights reserved.

Accelerating fully resolved simulation of particle-laden flows on heterogeneous computer architectures

期刊

PARTICUOLOGY

出版社

ELSEVIER SCIENCE INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Accelerating fully resolved simulation of particle-laden flows on heterogeneous computer architectures

期刊

PARTICUOLOGY

出版社

ELSEVIER SCIENCE INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文