4.7 Article

Redesigning and Optimizing UCSF DOCK3.7 on Sunway TaihuLight

Journal

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
Volume 33, Issue 12, Pages 4458-4471

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TPDS.2022.3194916

Keywords

Memory management; Optimization; Supercomputers; Mathematical models; Graphics processing units; Field programmable gate arrays; Drugs; Molecular docking; drug discovery; heterogeneous architecture; high-performance computing

Funding

  1. NSFC [61972231, U1806205]
  2. Key Project of Joint Fund of Shandong Province [ZR2019LZH00]
  3. PPP Project from CSC
  4. DAAD
  5. Engineering Research Center of Digital Media Technology, Ministry of Education, China

Ask authors/readers for more resources

This paper presents the porting and optimization of UCSF DOCK3.7 on the Sunway TaihuLight supercomputer. Several strategies, such as the producer-consumer strategy, a new binary file format, and ligand orientation scoring optimization, are employed to improve the performance and efficiency of molecular docking.
Molecular docking is the process of posing, scoring, and ranking small molecules at the binding sites of proteins to prioritize compounds for experimental testing. It is a widely-used computational method in the drug discovery process. However, it is a highly time-consuming procedure since a receptor may need to find favorable ligand orientations in billions of ligands. UCSF DOCK3.7 is one of the most widely used molecular docking applications. In this paper, we port and optimize UCSF DOCK3.7 on the Sunway TaihuLight supercomputer. To avoid the impact of load imbalance, we employ a producer-consumer strategy that can overlap I/O and computation in order to achieve high performance. Furthermore, we present a new binary file format to replace the mol2db2 file format for ligand storage and adopt xzip rather than gzip to compress ligand files. We show that our file format can reduce I/O time significantly while xzip saves significant storage. For the routines which determine the orientation of a ligand relative to the receptor, we present an improved algorithm to discard geometrically similar orientations. Furthermore, we fuse loops and compress memory usage to store data in fast Local Device Memory (LDM) in order to score ligand orientations with high efficiency. In addition, we propose a number of architecture-specific optimizations. Asynchronous data transfer and vectorization of computation are implemented to take full advantage of the SW26010 processor. Our experiments show that a speedup of 167 can be achieved by using the proposed strategies. Compared to a core of an Intel(R) Core(TM) i9-10900K CPU, our approach achieves speedups of 15 on a SW26010 core group. Furthermore, our implementation achieves strong scalability to hundreds of thousands of heterogeneous cores on the next-generation Sunway supercomputer.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available