4.2 Article

A Hardware-Friendly Tiled Singular-Value Decomposition-Based Matrix Multiplication for Transformer-Based Models

Related references

Note: Only part of the references are listed.
Proceedings Paper Computer Science, Hardware & Architecture

Accelerating Transformer Networks through Recomposing Softmax Layers

Jaewan Choi et al.

Summary: The transformer model is highly accurate compared to conventional models in various domains, and the importance of the softmax layer has been increasing. However, accelerating the softmax layer has been challenging due to differences in data access patterns. This study addresses the challenge by decomposing the layer into multiple sub-layers and fusing them with adjacent operations, resulting in significant speed improvements.

2022 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2022) (2022)

Proceedings Paper Computer Science, Theory & Methods

TSM2: Optimizing Tall-and-Skinny Matrix-Matrix Multiplication on GPUs

Jieyang Chen et al.

INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2019) (2019)

Proceedings Paper Computer Science, Hardware & Architecture

Accelerating Matrix Multiplication in Deep Learning by Using Low-Rank Approximation

Kazuki Osawa et al.

2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS) (2017)