4.7 Article

Redesign and Accelerate the AIREBO Bond-Order Potential on the New Sunway Supercomputer

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TPDS.2023.3321927

关键词

Carbon; Mathematical models; Hydrocarbons; Atmospheric modeling; Supercomputers; Computational modeling; Bonding; High-performance computing; molecular dynamics; LAMMPS; AIREBO; computational science; the new generation sunway supercomputer

向作者/读者索取更多资源

This article introduces the method of simulating carbon and hydrocarbon systems using the AIREBO potential in LAMMPS on the new Sunway supercomputer. By implementing parallel two-level building scheme, periodic buffering strategy, and optimized nearest-neighbor access algorithms, efficient simulation and computation are achieved.
Molecular dynamics (MD) is one of the most crucial computer simulation methods for understanding real-world processes at the atomic level. Reactive potentials based on the bond order concept have the ability to model dynamic bond breaking and formation with close to quantum mechanical (QM) precision without actually requiring expensive QM calculations. In this article, we focus on the adaptive intermolecular reactive empirical bond-order (AIREBO) potential in LAMMPS for the simulation of carbon and hydrocarbon systems on the new Sunway supercomputer. To achieve scalable performance, we propose a parallel two-level building scheme and periodic buffering strategy for the tailored data design to explore data locality and data reuse. Furthermore, we design two optimized nearest-neighbor access algorithms: the redistribution of accumulated coefficients algorithm and the double-end search connectivity algorithm. Finally, we implement parallel force computation with an AoS data layout and hardware/software co-cache. In addition, we have designed a low-overhead atomic operation-based load balancing method and vectorization. The overall performance of AIREBO achieves a speedup of nearly $20\times$20x on a single core group (CG), and more than $5\times$5x and $4\times$4x over an Intel Xeon E5 2680 v3 core and an Intel Xeon Gold 6138 core, respectively. Compared with the Intel accelerator package in LAMMPS, our performance further achieves $3.0\times$3.0x of an Intel Xeon E5 2680 v3 core and is better than that of an Intel Xeon Gold 6138 core. We complete the validation of the results in no more than 20.5 hours on a single node with 2,000,000 running steps (i.e., 1 ns). Our experiments show that the simulation of 2,139,095,040 atoms on 798,720 ((1MPE+64CPEs) x 12,288 processes) cores exhibits a parallel efficiency of 88% under weak scaling.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据