4.7 Article

OpenMP, OpenMP/MPI, and CUDA/MPI C programs for solving the time-dependent dipolar Gross-Pitaevskii equation

期刊

COMPUTER PHYSICS COMMUNICATIONS
卷 209, 期 -, 页码 190-196

出版社

ELSEVIER
DOI: 10.1016/j.cpc.2016.07.029

关键词

Bose-Einstein condensate; Dipolar atoms; Gross-Pitaevskii equation; Split-step Crank-Nicolson scheme; C program; OpenMP; GPU; CUDA program; MPI

资金

  1. Ministry of Education, Science, and Technological Development of the Republic of Serbia [ON171017, 0I1611005, III43007]
  2. SCOPES project [IZ74Z0-160453]
  3. FAPESP of Brazil [2012/21871-7, 2014/16363-8, 2012/00451-0]
  4. Science and Engineering Research Board, Department of Science and Technology, Government of India [EMR/2014/000644]
  5. CNPq of Brazil [303280/2014-0]
  6. Swiss National Science Foundation (SNF) [IZ74Z0_160453] Funding Source: Swiss National Science Foundation (SNF)

向作者/读者索取更多资源

We present new versions of the previously published C and CUDA programs for solving the dipolar Gross-Pitaevskii equation in one, two, and three spatial dimensions, which calculate stationary and non stationary solutions by propagation in imaginary or real time. Presented programs are improved and parallelized versions of previous programs, divided into three packages according to the type of parallelization. First package contains improved and threaded version of sequential C programs using OpenMP. Second package additionally parallelizes three-dimensional variants of the OpenMP programs using MPI, allowing them to be run on distributed-memory systems. Finally, previous three-dimensional CUDA-parallelized programs are further parallelized using MPI, similarly as the OpenMP programs. We also present speedup test results obtained using new versions of programs in comparison with the previous sequential C and parallel CUDA programs. The improvements to the sequential version yield a speedup of 1.1-1.9, depending on the program. OpenMP parallelization yields further speedup of 2-12 on a 16-core workstation, while OpenMP/MPI version demonstrates a speedup of 11.5-16.5 on a computer cluster with 32 nodes used. CUDA/MPI version shows a speedup of 9-10 on a computer cluster with 32 nodes.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据