☆ 4.7 Article

A simple one-step index algorithm for implementation of lattice Boltzmann method on GPU

COMPUTER PHYSICS COMMUNICATIONS (2023)

期刊

COMPUTER PHYSICS COMMUNICATIONS

卷 283, 期 -, 页码 -

出版社

ELSEVIER

DOI: 10.1016/j.cpc.2022.108603

关键词

Lattice Boltzmann method; One-step index algorithm; High-performance computing; Multi-GPUs

类别

Computer Science, Interdisciplinary Applications Physics, Mathematical

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

We proposed a simple one-step index (OSI) algorithm for solving the lattice Boltzmann equation, which achieves the streaming of particle distribution functions (PDFs) on a single grid system. The algorithm is derived from the conventional A-B pattern and has fixed memory addresses for the PDFs in accordance with collision principles. It reassigns their indexes to implicitly compute the streaming process. The algorithm is simple to program and suitable for GPUs, showing good performance and efficiency.

We proposed a simple one-step index (OSI) algorithm for solving the lattice Boltzmann equation, particularly the streaming of particle distribution functions (PDFs) on a single grid system. The OSI algorithm is derived from the conventional A-B pattern. The memory addresses of the PDFs are fixed in this algorithm and consistent with collision principles. The streaming process is implicitly computed by reassigning their indexes corresponding to the time steps, spatial coordinates, and directions of the PDFs. The algorithm is simple to program because it reads and writes the PDFs only once per time step and does not require the synchronization of odd and even time steps. In this implementation, the data layout of the PDFs is the structure of arrays (SoA), suitable for the memory access pattern of graphics processing units (GPUs). The accuracy and single-precision performance of the proposed algorithm for the three-dimensional lid-driven cavity flow simulation with the D3Q19 model were validated and tested on an NVIDIA A100 having a 40 GB PCIe using CUDA and OpenACC. Performances of 8.4 and 8.1 giga lattice updates per second were obtained for CUDA and OpenACC, respectively. OpenACC can outperform CUDA by up to 95% with significantly less programming work. The bandwidth usage rates on a single GPU were 96% and 94% for CUDA and OpenACC, respectively, close to the theoretical values. Lattice Boltzmann method parallelism is implemented using CUDA and MPI for multi-GPU usage. Finally, computation and communication overlaps were implemented to optimize the parallel efficiency, where the weak scaling parallel efficiency exceeded 0.98 on up to 512 GPUs.(c) 2022 Elsevier B.V. All rights reserved.

A simple one-step index algorithm for implementation of lattice Boltzmann method on GPU

期刊

COMPUTER PHYSICS COMMUNICATIONS

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A simple one-step index algorithm for implementation of lattice Boltzmann method on GPU

期刊

COMPUTER PHYSICS COMMUNICATIONS

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文