4.7 Article

Redesigning and Optimizing UCSF DOCK3.7 on Sunway TaihuLight

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TPDS.2022.3194916

关键词

Memory management; Optimization; Supercomputers; Mathematical models; Graphics processing units; Field programmable gate arrays; Drugs; Molecular docking; drug discovery; heterogeneous architecture; high-performance computing

资金

  1. NSFC [61972231, U1806205]
  2. Key Project of Joint Fund of Shandong Province [ZR2019LZH00]
  3. PPP Project from CSC
  4. DAAD
  5. Engineering Research Center of Digital Media Technology, Ministry of Education, China

向作者/读者索取更多资源

This paper presents the porting and optimization of UCSF DOCK3.7 on the Sunway TaihuLight supercomputer. Several strategies, such as the producer-consumer strategy, a new binary file format, and ligand orientation scoring optimization, are employed to improve the performance and efficiency of molecular docking.
Molecular docking is the process of posing, scoring, and ranking small molecules at the binding sites of proteins to prioritize compounds for experimental testing. It is a widely-used computational method in the drug discovery process. However, it is a highly time-consuming procedure since a receptor may need to find favorable ligand orientations in billions of ligands. UCSF DOCK3.7 is one of the most widely used molecular docking applications. In this paper, we port and optimize UCSF DOCK3.7 on the Sunway TaihuLight supercomputer. To avoid the impact of load imbalance, we employ a producer-consumer strategy that can overlap I/O and computation in order to achieve high performance. Furthermore, we present a new binary file format to replace the mol2db2 file format for ligand storage and adopt xzip rather than gzip to compress ligand files. We show that our file format can reduce I/O time significantly while xzip saves significant storage. For the routines which determine the orientation of a ligand relative to the receptor, we present an improved algorithm to discard geometrically similar orientations. Furthermore, we fuse loops and compress memory usage to store data in fast Local Device Memory (LDM) in order to score ligand orientations with high efficiency. In addition, we propose a number of architecture-specific optimizations. Asynchronous data transfer and vectorization of computation are implemented to take full advantage of the SW26010 processor. Our experiments show that a speedup of 167 can be achieved by using the proposed strategies. Compared to a core of an Intel(R) Core(TM) i9-10900K CPU, our approach achieves speedups of 15 on a SW26010 core group. Furthermore, our implementation achieves strong scalability to hundreds of thousands of heterogeneous cores on the next-generation Sunway supercomputer.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据