4.7 Article

DFT-FE 1.0: A massively parallel hybrid CPU-GPU density functional theory code using finite-element discretization

期刊

COMPUTER PHYSICS COMMUNICATIONS
卷 280, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.cpc.2022.108473

关键词

Electronic structure; Real-space; Spectral finite-elements; Mixed-precision arithmetic; Pseudopotential; All-electron; GPU

资金

  1. Department of Energy, Office of Basic Energy Sciences [DE-SC0008637]
  2. Toyota Research Institute
  3. Air Force Office of Scientific Research [FA9550-21-1-0302]
  4. DOE Office of Science User Facility [DE-AC05-00OR22725]
  5. Office of Science of the U.S. Department of Energy [DE-AC02-05CH11231]
  6. National Science Foundation [ACI-1053575]
  7. Army Research Office through the DURIP [W911NF1810242]
  8. Indian Institute of Science and SERB Startup Research
  9. Department of Science and Technology India [SRG/2020/002194]
  10. U.S. Department of Defense (DOD) [W911NF1810242] Funding Source: U.S. Department of Defense (DOD)
  11. U.S. Department of Energy (DOE) [DE-SC0008637] Funding Source: U.S. Department of Energy (DOE)

向作者/读者索取更多资源

DFT-FE 1.0 is an improved version of DFT-FE 0.6, which allows for fast and accurate large-scale DFT calculations on various computing architectures. The improvements include enhanced treatment of electrostatic interactions and GPU acceleration. The method has been shown to be accurate and efficient in benchmark systems.
We present DFT-FE 1 . 0, building on DFT-FE 0 . 6 Motamarri et al. (2020) [28], to conduct fast and accurate large-scale density functional theory (DFT) calculations (reaching similar to 100, 000 electrons) on both many-core CPU and hybrid CPU-GPU computing architectures. This work involves improvements in the real-space formulation-via an improved treatment of the electrostatic interactions that substantially enhances the computational efficiency-as well high-performance computing aspects, including the GPU acceleration of all the key compute kernels in DFT - FE. We demonstrate the accuracy by comparing the ground-state energies, ionic forces and cell stresses on a wide-range of benchmark systems against those obtained from widely used DFT codes. Further, we demonstrate the numerical efficiency of our GPU acceleration, which yields similar to 20x speed-up on hybrid CPU-GPU nodes of the Summit supercomputer. Notably, owing to the parallel-scaling of the GPU implementation, we obtain wall-times of 80 - 140 seconds for full ground-state calculations, with stringent accuracy, on benchmark systems containing similar to 6, 000 - 15, 000 electrons using 64 - 224 nodes of the Summit supercomputer. Program summary Program Title: DFT-FE CPC Library link to program files: https://doi.org/10.17632/c5ghfc6ctn.1 Developer's repository link: https://github com/dftfeDevelopers/dftfe Licensing provisions: LGPL v3 Programming language: C/C++ External routines/libraries: p4est (http://www.p4est.org/), deal.II (https://www.dealii.org/), BLAS (http://www.netlib.org/blas/), LAPACK (http://www.netlib.org/lapack/), ELPA (https://elpa.mpcdf.mpg.de/), ScaLAPACK (http://www.netlib.org/scalapack/), Spglib (https://atztogo.github.io/spglib), ALGLIB (http://www.alglib.net/), LIBXC (http://www.tddft.org/programs/libxc/), PETSc (https://www.mcs.anl.gov/petsc), SLEPc (http://slepc.upv.es), NCCL (optical-https://github.com/NVIDIA/nccl). Nature of problem: Density functional theory calculations. Solution method: We employ a local real-space variational formulation of Kohn-Sham density functional theory that is applicable for both pseudopotential and all-electron calculations on periodic, semiperiodic and non-periodic geometries. Higher-order adaptive spectral finite-element basis is used to discretize the Kohn-Sham equations. Chebyshev polynomial filtered subspace iteration procedure (ChFSI) is employed to solve the nonlinear Kohn-Sham eigenvalue problem self-consistently. ChFSI in DFT-FE employs Cholesky factorization based orthonormalization, and spectrum splitting based Rayleigh-Ritz procedure in conjunction with mixed precision arithmetic. Configurational force approach is used to compute ionic forces and periodic cell stresses for geometry optimization. Additional comments including restrictions and unusual features: Exchange correlation functionals are restricted to Local Density Approximation (LDA) and Generalized Gradient Approximation (GGA), with and without spin. The pseudopotentials available are optimized norm conserving Vanderbilt (ONCV) pseudopotentials and Troullier-Martins (TM) pseudopotentials. Calculations are non-relativistic. DFT-FE handles all-electron and pseudopotential calculations in the same framework, while accommodating periodic, non-periodic and semi-periodic boundary conditions. (C) 2022 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据