4.5 Article

Scalability and efficiency challenges for the exascale supercomputing system: practice of a parallel supporting environment on the Sunway exascale prototype system

Publisher

ZHEJIANG UNIV PRESS
DOI: 10.1631/FITEE.2200412

Keywords

Parallel computing; Sunway; Ultra-large-scale; Supercomputer; TP302

Ask authors/readers for more resources

With the advancement of supercomputer performance and the integration of artificial intelligence with traditional scientific computing, parallel applications have scaled from millions to tens of millions of computing cores. This poses challenges in achieving high scalability and efficiency on super-large-scale systems. This paper analyzes the challenges faced by parallel applications in the exascale era using the Sunway exascale prototype system as an example. It highlights optimization technologies employed in the parallel supporting environment software, such as the parallel operating system, I/O optimization technology, debugging technology, parallel algorithm, and mixed-precision method. The contributions to various applications running on the Sunway exascale prototype system are also discussed, showcasing the effectiveness of the parallel supporting environment design.
With the continuous improvement of supercomputer performance and the integration of artificial intelligence with traditional scientific computing, the scale of applications is gradually increasing, from millions to tens of millions of computing cores, which raises great challenges to achieve high scalability and efficiency of parallel applications on super-large-scale systems. Taking the Sunway exascale prototype system as an example, in this paper we first analyze the challenges of high scalability and high efficiency for parallel applications in the exascale era. To overcome these challenges, the optimization technologies used in the parallel supporting environment software on the Sunway exascale prototype system are highlighted, including the parallel operating system, input/output (I/O) optimization technology, ultra-large-scale parallel debugging technology, 10-million-core parallel algorithm, and mixed-precision method. Parallel operating systems and I/O optimization technology mainly support large-scale system scaling, while the ultra-large-scale parallel debugging technology, 10-million-core parallel algorithm, and mixed-precision method mainly enhance the efficiency of large-scale applications. Finally, the contributions to various applications running on the Sunway exascale prototype system are introduced, verifying the effectiveness of the parallel supporting environment design.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available