☆ 4.5 Article

E-LSTM: An Efficient Hardware Architecture for Long Short-Term Memory

IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS (2019)

Journal

IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS

Volume 9, Issue 2, Pages 280-291

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/JETCAS.2019.2911739

Keywords

Hardware acceleration; long short-term memory (LSTM); model compression; recurrent neural network (RNN); deep learning; FPCA

Funding

National Natural Science Foundation of China [61774082, 61604068]
Fundamental Research Funds for the Central Universities [021014380065, 021014380087]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Long Short-Term Memory (LSTM) and its variants have been widely adopted in many sequential learning tasks, such as speech recognition and machine translation. Significant accuracy improvements can be achieved using complex LSTM model with a large memory requirement and high computational complexity, which is time-consuming and energy demanding. The low-latency and energy-efficiency requirements of the realworld applications make model compression and hardware acceleration for LSTM an urgent need. In this paper, several hardware-efficient network compression schemes are introduced first, including structured top-k pruning, clipped gating, and multiplication-free quantization, to reduce the model size and the number of matrix operations by 32x and 21.6x, respectively, with negligible accuracy loss. Furthermore, efficient hardware architectures for accelerating the compressed LSTM are proposed, which support the inference of multi-layer and multiple time steps. The computation process is judiciously reorganized and the memory access pattern is well optimized, which alleviate the limited memory bandwidth bottleneck and enable higher throughput. Moreover, the parallel processing strategy is carefully designed to make full use of the sparsity introduced by pruning and clipped gating with high hardware utilization efficiency. Implemented on Intel Arria10 SX660 FPCA running at 200MHz, the proposed design is able to achieve 1.4-2.2x energy efficiency and requires significantly less hardware resources compared with the state-of-the-art LSTM implementations.

E-LSTM: An Efficient Hardware Architecture for Long Short-Term Memory

Journal

IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

E-LSTM: An Efficient Hardware Architecture for Long Short-Term Memory

Journal

IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper