4.7 Article

GPU implementation of the discrete unified gas kinetic scheme for low-speed isothermal flows

期刊

COMPUTER PHYSICS COMMUNICATIONS
卷 294, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.cpc.2023.108908

关键词

DUGKS; Multi-scale flows; GPU acceleration; CUDA

向作者/读者索取更多资源

This paper proposes two GPU parallel algorithms for simulating low-speed isothermal flows using the discrete unified gas kinetic scheme (DUGKS). The performance of the algorithms is evaluated through simulations of benchmark problems, and the results show satisfactory computational efficiency. The algorithms have different performance in different scenarios.
In this paper, two GPU parallel algorithms are proposed for the discrete unified gas kinetic scheme (DUGKS) for simulating low-speed isothermal flows. Algorithm-I uses a two-level fine-grain technique for the parallelization of physical spatial space, while Algorithm-II adopts this technique for both physical spatial and particle velocity spaces. To evaluate the performance of the proposed algorithms, several typical benchmark problems are simulated, including the two-dimensional (2D) and three-dimensional (3D) lid-driven cavity flows, the micro channel and cavity flows. Numerical results show that our GPU algorithms can achieve satisfactory computational efficiency. For Algorithm-I, the speedup can reach 250 and 338 on a Tesla V100 GPU card for the 2D and 3D continuum cavity flows, respectively, and a hundredfold acceleration can be obtained for the rarefied cases. While for Algorithm-II, a speedup of about 70 can be attained for rarefied cases. However, it is not applied to continuum problems that only require a small number of velocity points. Moreover, comparisons between the two GPU algorithms are also conducted for the rarefied flows with various grid meshes and velocity directions. The results show that Algorithm-I performs better when physical mesh size is large, while Algorithm-II can provide higher efficiency for a coarser mesh with medium number of discrete velocities. Special attention is also paid to comparisons between Algorithm-I and MPI parallelization with 128 CPU cores based on physical space discretization approach, and it is found that Algorithm-I has a clear advantage on V100 GPU when dealing with sparse physical grids in both continuum and rarefied cases. (c) 2023 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据