4.3 Article

Towards electronic structure-based ab-initio molecular dynamics simulations with hundreds of millions of atoms

Journal

PARALLEL COMPUTING
Volume 111, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.parco.2022.102920

Keywords

Supercomputing; High-performance computing; Massively-parallel algorithms; Large-scale linear algebra; Ab-initio molecular dynamics; Approximate computing

Funding

  1. Gauss Centre for Supercomputing e.V.
  2. Paderborn Center for Parallel Computing (PC2)
  3. Federal Ministry of Education and Research (BMBF)
  4. State of North Rhine-Westphalia as part of the NHR Program
  5. European Research Council (ERC) under the European Union [716142]
  6. Paderborn University's research award for GreenIT''
  7. European Research Council (ERC) [716142] Funding Source: European Research Council (ERC)

Ask authors/readers for more resources

This study pushes the boundaries of electronic structure-based AIMD beyond 100 million atoms by combining innovative methodologies and compensating numerical approximations. The sustained performance achieved on NVIDIA A100 GPUs with the NOLSM method reaches 324 PFLOP/s.
We push the boundaries of electronic structure-based ab-initio molecular dynamics (AIMD) beyond 100 million atoms. This scale is otherwise barely reachable with classical force-field methods or novel neural network and machine learning potentials. We achieve this breakthrough by combining innovations in linear-scaling AIMD, efficient and approximate sparse linear algebra, low and mixed-precision floating-point computation on GPUs, and a compensation scheme for the errors introduced by numerical approximations. The core of our work is the non-orthogonalized local submatrix method (NOLSM), which scales very favorably to massively parallel computing systems and translates large sparse matrix operations into highly parallel, dense matrix operations that are ideally suited to hardware accelerators. We demonstrate that the NOLSM method, which is at the center point of each AIMD step, is able to achieve a sustained performance of 324 PFLOP/s in mixed FP16/FP32 precision corresponding to an efficiency of 67.7% when running on 1536 NVIDIA A100 GPUs.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available