☆ 4.2 Article

A Hardware-Friendly Tiled Singular-Value Decomposition-Based Matrix Multiplication for Transformer-Based Models

IEEE COMPUTER ARCHITECTURE LETTERS (2023)

Journal

IEEE COMPUTER ARCHITECTURE LETTERS

Volume 22, Issue 2, Pages 169-172

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/LCA.2023.3323482

Keywords

Matrix decomposition; Computational modeling; Transformers; Graphics processing units; Natural language processing; Kernel; Task analysis; Transformer-based model; GPU; tiled singular vector decomposition

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Transformer-based models are widely used in NLP tasks, but matrix multiplication can be time-consuming. This paper introduces a hardware-friendly approach called tiling singular value decomposition (TSVD) for matrix multiplication, which leverages GPU resources more efficiently and mitigates the loss of important information. The experimental results show that TSVD-matmul achieves significant speedup compared to the SVD approach.

Transformer-based models have become the backbone of numerous state-of-the-art natural language processing (NLP) tasks, including large language models. Matrix multiplication, a fundamental operation in the Transformer-based models, accounts for most of the execution time. While singular value decomposition (SVD) can accelerate this operation by reducing the amount of computation and memory footprints through rank size reduction, it leads to degraded model quality due to challenges in preserving important information. Moreover, this method does not effectively utilize the resources of modern GPUs. In this paper, we propose a hardware-friendly approach: matrix multiplication based on tiled singular value decomposition (TSVD). TSVD divides a matrix into multiple tiles and performs matrix factorization on each tile using SVD. By breaking down the process into smaller regions, TSVD mitigates the loss of important data. We apply the matrices decomposed by TSVD for matrix multiplication, and our TSVD-based matrix multiplication (TSVD-matmul) leverages GPU resources more efficiently compared to the SVD approach. As a result, TSVD-matmul achieved a speedup of 1.03x to 3.24x compared to the SVD approach at compression ratios ranging from 2 to 8. When deployed to GPT-2, TSVD not only performs competitively with a full fine-tuning on the E2E NLG task but also achieves a speedup of 1.06x to 1.24x at 2 to 8 compression ratios while increasing accuracy by up to 1.5 BLEU score.

A Hardware-Friendly Tiled Singular-Value Decomposition-Based Matrix Multiplication for Transformer-Based Models

Journal

IEEE COMPUTER ARCHITECTURE LETTERS

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A Hardware-Friendly Tiled Singular-Value Decomposition-Based Matrix Multiplication for Transformer-Based Models

Journal

IEEE COMPUTER ARCHITECTURE LETTERS

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper