☆ 3.9 Article Proceedings Paper

Model-driven Autotuning of Sparse Matrix-Vector Multiply on GPUs

ACM SIGPLAN NOTICES (2010)

期刊

ACM SIGPLAN NOTICES

卷 45, 期 5, 页码 115-125

出版社

ASSOC COMPUTING MACHINERY

DOI: 10.1145/1837853.1693471

关键词

Algorithms; Performance; GPU; sparse matrix-vector multiplication; performance modeling

类别

Computer Science, Software Engineering

资金

National Science Foundation (NSF) [0833136]
NSF TeraGrid allocation [CCR-090024]
NSF / Semiconductor Research Corporation (SRC) [0903447, 1981]
Defense Advanced Research Projects Agency (DARPA)
Direct For Computer & Info Scie & Enginr
Division of Computing and Communication Foundations [0953100] Funding Source: National Science Foundation
Direct For Computer & Info Scie & Enginr
Division of Computing and Communication Foundations [0903447, 0833136] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

We present a performance model-driven framework for automated performance tuning (autotuning) of sparse matrix-vector multiply (SpMV) on systems accelerated by graphics processing units (GPU). Our study consists of two parts. First, we describe several carefully hand-tuned SpMV implementations for GPUs, identifying key GPU-specific performance limitations, enhancements, and tuning opportunities. These implementations, which include variants on classical blocked compressed sparse row (BCSR) and blocked ELLPACK (BELLPACK) storage formats, match or exceed state-of-the-art implementations. For instance, our best BELLPACK implementation achieves up to 29.0 Gflop/s in single-precision and 15.7 Gflop/s in double-precision on the NVIDIA T10P multiprocessor (C1060), enhancing prior state-of-the-art unblocked implementations (Bell and Garland, 2009) by up to 1.8x and 1.5x for single-and double-precision respectively. However, achieving this level of performance requires input matrix-dependent parameter tuning. Thus, in the second part of this study, we develop a performance model that can guide tuning. Like prior autotuning models for CPUs (e.g., Im, Yelick, and Vuduc, 2004), this model requires offline measurements and run-time estimation, but more directly models the structure of multithreaded vector processors like GPUs. We show that our model can identify the implementations that achieve within 15% of those found through exhaustive search.

Model-driven Autotuning of Sparse Matrix-Vector Multiply on GPUs

期刊

ACM SIGPLAN NOTICES

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Model-driven Autotuning of Sparse Matrix-Vector Multiply on GPUs

期刊

ACM SIGPLAN NOTICES

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文