4.7 Article

CASpMV: A Customized and Accelerative SpMV Framework for the Sunway TaihuLight

Journal

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TPDS.2019.2907537

Keywords

Heterogeneous many-core processor; matrix partition; optimization; parallelism; SpMV; Sunway TaihuLight supercomputer

Funding

  1. National Key R&D Program of China [2018YFB0203800, 2016YFB0200201]
  2. Outstanding Youth Science Program of National Natural Science Foundation of China [61625202]
  3. Key Program of National Natural Science Foundation of China [61432005]
  4. Youth Science Program of National Natural Science Foundation of China [61806077]
  5. Program of Hunan Provincial Innovation Foundation for Postgraduate [CX2018B230]
  6. International Postdoctoral Exchange Fellowship Program of China Postdoctoral Council [OCPC2017032]
  7. Fellowship Program of China Scholarship Council

Ask authors/readers for more resources

This paper introduces a customized and accelerative framework for SpMV on the Sunway, addressing performance limitations. CASpMV shows significant improvement over generic parallel SpMV on the Sunway and exhibits good scalability on multiple CGs.
The Sunway TaihuLight, equipped with 10 million cores, is currently the world's third fastest supercomputer. SpMV is one of core algorithms in many high-performance computing applications. This paper implements a fine-grained design for generic parallel SpMV based on the special Sunway architecture and finds three main performance limitations, i.e., storage limitation, load imbalance, and huge overhead of irregular memory accesses. To address these problems, this paper introduces a customized and accelerative framework for SpMV (CASpMV) on the Sunway. The CASpMV customizes an auto-tuning four-way partition scheme for SpMV based on the proposed statistical model, which describes the sparse matrix structure characteristics, to make it better fit in with the computing architecture and memory hierarchy of the Sunway. Moreover, the CASpMV provides an accelerative method and customized optimizations to avoid irregular memory accesses and further improve its performance on the Sunway. Our CASpMV achieves a performance improvement that ranges from 588.05 to 2118.62 percent over the generic parallel SpMV on a CG (which corresponds to an MPI process) of the Sunway on average and has good scalability on multiple CGs. The performance comparisons of the CASpMV with state-of-the-art methods on the Sunway indicate that the sparsity and irregularity of data structures have less impact on CASpMV.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available