4.7 Article

Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer

Journal

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TPDS.2018.2871189

Keywords

Heterogeneous many-core processor; parallelism; performance analysis; performance-aware; SpGEMM; Sunway TaihuLight supercomputer

Funding

  1. National Key R&D Program of China [2016YFB0200201]
  2. National Outstanding Youth Science Program of National Natural Science Foundation of China [61625202]
  3. International (Regional) Cooperation and Exchange Program of National Natural Science Foundation of China [61661146006, 61860206011]
  4. Program of National Natural Science Foundation of China [61751204, 61806077]

Ask authors/readers for more resources

General sparse matrix-sparse matrix multiplication (SpGEMM) is one of the fundamental linear operations in a wide variety of scientific applications. To implement efficient SpGEMM for many large-scale applications, this paper proposes scalable and optimized SpGEMM kernels based on COO, CSR, ELL, and CSC formats on the Sunway TaihuLight supercomputer. First, a multi-level parallelism design for SpGEMM is proposed to exploit the parallelism of over 10 millions cores and better control memory based on the special Sunway architecture. Optimization strategies, such as load balance, coalesced DMA transmission, data reuse, vectorized computation, and parallel pipeline processing, are applied to further optimize performance of SpGEMM kernels. Second, we thoroughly analyze the performance of the proposed kernels. Third, a performance-aware model for SpGEMM is proposed to select the most appropriate compressed storage formats for the sparse matrices that can achieve the optimal performance of SpGEMM on the Sunway. The experimental results show the SpGEMM kernels have good scalability and meet the challenge of the high-speed computing of large-scale data sets on the Sunway. In addition, the performance-aware model for SpGEMM achieves an absolute value of relative error rate of 8.31 percent on average when the kernels are executed in one single process and achieves 8.59 percent on average when the kernels are executed in multiple processes. It is proved that the proposed performance-aware model can perform at high accuracy and satisfies the precision of selecting the best formats for SpGEMM on the Sunway TaihuLight supercomputer.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available