4.5 Article

A massively parallel tensor contraction framework for coupled-cluster computations

期刊

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
卷 74, 期 12, 页码 3176-3190

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.jpdc.2014.06.002

关键词

Coupled-cluster; Tensor contractions; Matrix multiplication; Topology-aware mapping; Communication-avoiding algorithms

资金

  1. Department of Energy Computational Science Graduate Fellowship [DE-FG02-97ER25308]
  2. Microsoft [024263]
  3. Intel [024894]
  4. U.C. Discovery [DIG07-10227]
  5. DOE [DE-SC0004938, DE-SC0005136, DE-SC0003959, DE-SC0008700, AC02-05CH11231]
  6. DARPA [HR0011-12-2-0016]
  7. Office of Science of the US Department of Energy [DE-AC02-06CH11357, DE-AC02-05CH11231]
  8. ParLab affiliates National Instruments
  9. Nokia
  10. NVIDIA
  11. Oracle
  12. Samsung
  13. MathWorks
  14. U.S. Department of Energy (DOE) [DE-SC0005136, DE-SC0008700] Funding Source: U.S. Department of Energy (DOE)

向作者/读者索取更多资源

Precise calculation of molecular electronic wavefunctions by methods such as coupled-cluster requires the computation of tensor contractions, the cost of which has polynomial computational scaling with respect to the system and basis set sizes. Each contraction may be executed via matrix multiplication on a properly ordered and structured tensor. However, data transpositions are often needed to reorder the tensors for each contraction. Writing and optimizing distributed-memory kernels for each transposition and contraction is tedious since the number of contractions scales combinatorially with the number of tensor indices. We present a distributed-memory numerical library (Cyclops Tensor Framework (CTF)) that automatically manages tensor blocking and redistribution to perform any user-specified contractions. CTF serves as the distributed-memory contraction engine in Aquarius, a new program designed for high-accuracy and massively-parallel quantum chemical computations. Aquarius implements a range of coupled-cluster and related methods such as CCSD and CCSDT by writing the equations on top of a C++ templated domain-specific language. This DSL calls CTF directly to manage the data and perform the contractions. Our CCSD and CCSDT implementations achieve high parallel scalability on the BlueGene/Q and Cray XC30 supercomputer architectures showing that accurate electronic structure calculations can be effectively carried out on top of general distributed-memory tensor primitives. (C) 2014 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据