4.7 Article

Optimization of Reactive Force Field Simulation: Refactor, Parallelization, and Vectorization for Interactions

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TPDS.2021.3091408

关键词

High performance computing; molecular dynamics; computational science; Sunway TaihuLight supercomputer (TaihuLight)

资金

  1. National Key R&D Program of China [2020YFB0204700]
  2. NSFC [61972231, U1806205]
  3. Key Project of Joint Fund of Shandong Province [ZR2019LZH007]
  4. Key Research and Development Program of Shandong Province [2018CXGC1211]
  5. PPP project from CSC and DAAD
  6. Center for High Performance Computing and System Simulation, Pilot National Laboratory for Marine Science and Technology, Qingdao
  7. Engineering Research Center of Digital Media Technology, Ministry of Education, China

向作者/读者索取更多资源

Molecular dynamics simulations have become increasingly important in various fields. By optimizing the computation of interactions, we achieved significantly faster simulations and proposed a method to eliminate write conflicts, resulting in a significant speedup. Compared to other software packages, our implementation allows for simulations of a large number of atoms on a large-scale cluster with high efficiency.
Molecular dynamics (MD) simulations are playing an increasingly important role in many areas ranging from chemical materials to biological molecules. With the continuing development of MD models, the potentials are getting larger and more complex. In this article, we focus on the reactive force field (ReaxFF) potential from LAMMPS to optimize the computation of interactions. We present our efforts on refactoring for neighbor list building, bond order computation, as well as valence angles and torsion angles computation. After redesigning these kernels, we develop a vectorized implementation for non-bonded interactions, which is nearly 100x faster than the management processing element (MPE) on the Sunway TaihuLight supercomputer. Furthermore, we have implemented the three-body-list free torsion angles computation, and propose a line-locked software cache method to eliminate write conflicts in the torsion angle and valence angle interactions resulting in an order-of-magnitude speedup on a single Sunway TaihuLight node. In addition, we achieve a speedup of up to 3.5 compared to the KOKKOS package on an Intel Xeon Gold 6148 core. When executed on 1,024 processes, our implementation enables the simulation of 21,233,664 atoms on 66,560 cores with a performance of 0.032 ns/day and a weak scaling efficiency of 95.71 percent.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据