4.5 Article

GPU_PBTE: an efficient solver for three and four phonon scattering rates on graphics processing units

期刊

JOURNAL OF PHYSICS-CONDENSED MATTER
卷 33, 期 49, 页码 -

出版社

IOP PUBLISHING LTD
DOI: 10.1088/1361-648X/ac268d

关键词

phonon Boltzmann transport equation; graphics processing units; thermal conductivity; phonon scattering

资金

  1. National Natural Science Foundation of China [51706134]
  2. Center for High Performance Computing at Shanghai Jiao Tong University

向作者/读者索取更多资源

This paper introduces a new algorithm optimized for calculating lattice thermal conductivity on GPUs, significantly improving double precision performance by two to three orders of magnitude compared to CPU. A new open-source code, GPU_PBTE, has been developed for studying the thermal transport properties of materials.
Lattice thermal conductivity (LTC) is a key parameter for many technological applications. Based on the Peierls-Boltzmann transport equation (PBTE), many unique phonon transport properties of various materials were revealed. Accurate calculation of LTC with PBTE, however, is a time-consuming task, especially for compounds with a complex crystal structure or taking high-order phonon scattering into consideration. Graphical processing units (GPUs) have been extensively used to accelerate scientific simulations, making it possible to use a single desktop workstation for calculations that used to require supercomputers. Due to its fundamental differences from traditional processors, GPUs are especially suited for executing a large group of similar tasks with minimal communication, but require completely different algorithm design. In this paper, we provide a new algorithm optimized for GPUs, where a two-kernel method is used to avoid divergent branching. A new open-source code, GPU_PBTE, is developed based on the proposed algorithm. As demonstrations, we investigate the thermal transport properties of silicon and silicon carbide, and find that accurate and reliable LTC can be obtained by our software. GPU_PBTE performed on NVIDIA Tesla V100 can extensively improve double precision performance, making it two to three orders of magnitude faster than our CPU version performed on Intel Xeon CPU Gold 6248 @2.5 GHz. Our work also provides an idea of accelerating calculations with other novel hardware that may come out in the future.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据