4.2 Article

A Hardware-Friendly Tiled Singular-Value Decomposition-Based Matrix Multiplication for Transformer-Based Models

Journal

IEEE COMPUTER ARCHITECTURE LETTERS
Volume 22, Issue 2, Pages 169-172

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/LCA.2023.3323482

Keywords

Matrix decomposition; Computational modeling; Transformers; Graphics processing units; Natural language processing; Kernel; Task analysis; Transformer-based model; GPU; tiled singular vector decomposition

Ask authors/readers for more resources

Transformer-based models are widely used in NLP tasks, but matrix multiplication can be time-consuming. This paper introduces a hardware-friendly approach called tiling singular value decomposition (TSVD) for matrix multiplication, which leverages GPU resources more efficiently and mitigates the loss of important information. The experimental results show that TSVD-matmul achieves significant speedup compared to the SVD approach.
Transformer-based models have become the backbone of numerous state-of-the-art natural language processing (NLP) tasks, including large language models. Matrix multiplication, a fundamental operation in the Transformer-based models, accounts for most of the execution time. While singular value decomposition (SVD) can accelerate this operation by reducing the amount of computation and memory footprints through rank size reduction, it leads to degraded model quality due to challenges in preserving important information. Moreover, this method does not effectively utilize the resources of modern GPUs. In this paper, we propose a hardware-friendly approach: matrix multiplication based on tiled singular value decomposition (TSVD). TSVD divides a matrix into multiple tiles and performs matrix factorization on each tile using SVD. By breaking down the process into smaller regions, TSVD mitigates the loss of important data. We apply the matrices decomposed by TSVD for matrix multiplication, and our TSVD-based matrix multiplication (TSVD-matmul) leverages GPU resources more efficiently compared to the SVD approach. As a result, TSVD-matmul achieved a speedup of 1.03x to 3.24x compared to the SVD approach at compression ratios ranging from 2 to 8. When deployed to GPT-2, TSVD not only performs competitively with a full fine-tuning on the E2E NLG task but also achieves a speedup of 1.06x to 1.24x at 2 to 8 compression ratios while increasing accuracy by up to 1.5 BLEU score.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available