4.7 Article

ROSETTA: A Resource and Energy-Efficient Inference Processor for Recurrent Neural Networks Based on Programmable Data Formats and Fine Activation Pruning

Related references

Note: Only part of the references are listed.
Article Computer Science, Hardware & Architecture

Recurrent Neural Networks With Column-Wise Matrix-Vector Multiplication on FPGAs

Zhiqiang Que et al.

Summary: This article presents a reconfigurable accelerator for REcurrent Neural networks with fine-grained cOlumn-Wise matrix-vector multiplicatioN (RENOWN). The design utilizes column-wise matrix-vector multiplication and introduces a latency-hiding architecture and a configurable tiling strategy to improve hardware utilization and system throughput. Evaluation results show superior performance compared to existing accelerators on FPGAs.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS (2022)

Article Computer Science, Hardware & Architecture

A low-latency LSTM accelerator using balanced sparsity based on FPGA

Jingfei Jiang et al.

Summary: This paper proposes a shared index bank-balanced sparsity (SIBBS) compression method to accelerate LSTM inference. The method significantly reduces latency with little accuracy degradation and overhead, achieving a high compression ratio. By implementing a customized accelerator on the Xilinx XCKU115 FPGA, our accelerator outperforms existing FPGA-based LSTM accelerators in terms of latency.

MICROPROCESSORS AND MICROSYSTEMS (2022)

Article Computer Science, Hardware & Architecture

When Massive GPU Parallelism Ain't Enough: A Novel Hardware Architecture of 2D-LSTM Neural Network

Vladimir Rybalkin et al.

Summary: Multidimensional Long Short-Term Memory (MD-LSTM) neural network achieves state-of-the-art results in various applications, but suffers from slow implementation. This research aims to accelerate the inference of MD-LSTM using Field-Programmable Gate Array (FPGA) platform and presents a new hardware architecture that achieves higher throughput, energy efficiency, and resource efficiency.

ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS (2022)

Article Engineering, Electrical & Electronic

GBC: An Energy-Efficient LSTM Accelerator With Gating Units Level Balanced Compression Strategy

Bi Wu et al.

Summary: This paper investigates the Gating Units Level Balanced Compression (GBC) strategy in recurrent neural networks (RNNs), achieving high compression rates for long short-term memory (LSTM) models while reducing additional parameters. Experimental results demonstrate that GBC significantly reduces storage space and computational complexity of LSTM models while maintaining accuracy, and hardware experiments show improved energy efficiency compared to state-of-the-art designs.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS (2022)

Article Computer Science, Information Systems

PermLSTM: A High Energy-Efficiency LSTM Accelerator Architecture

Yong Zheng et al.

Summary: The paper introduces a normalized linear quantization method and the use of permuted block diagonal mask matrices to generate sparse models for LSTM acceleration, along with the high energy efficiency accelerator PermLSTM, resulting in a 55.1% reduction in power consumption. The accelerator achieved a 2.19x to 24.4x increase in energy efficiency compared to previous FPGA-based LSTM accelerators.

ELECTRONICS (2021)

Article Computer Science, Hardware & Architecture

Specializing FGPU for Persistent Deep Learning

Rui Ma et al.

Summary: Overlay architectures are a good way for fast development and debugging on FPGAs, albeit potentially limited in performance compared to fully customized designs. When used in conjunction with hand-tuned FPGA solutions, performant overlay architectures can improve solution efficiency and overall productivity.

ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS (2021)

Article Computer Science, Information Systems

AERO: A 1.28 MOP/s/LUT Reconfigurable Inference Processor for Recurrent Neural Networks in a Resource-Limited FPGA

Jinwon Kim et al.

Summary: AERO is a resource-efficient reconfigurable inference processor designed for recurrent neural networks (RNN) of various types. It utilizes a versatile vector-processing unit (VPU) to achieve high resource efficiency by processing primitive vector operations and utilizing an approximation scheme for multiplication. The resource efficiency of AERO was found to be significantly higher than the previous state-of-the-art result, reaching 1.28 MOP/s/LUT.

ELECTRONICS (2021)

Article Computer Science, Hardware & Architecture

POLAR: A Pipelined/Overlapped FPGA-Based LSTM Accelerator

Erfan Bank-Tavakoli et al.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS (2020)

Article Engineering, Electrical & Electronic

An 8.93 TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity for On-Device Speech Recognition

Deepak Kadetotad et al.

IEEE JOURNAL OF SOLID-STATE CIRCUITS (2020)

Article Computer Science, Information Systems

Efficient Hardware Architectures for 1D-and MD-LSTM Networks

Vladimir Rybalkin et al.

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY (2020)

Article Computer Science, Information Systems

Mapping Large LSTMs to FPGAs with Weight Reuse

Zhiqiang Que et al.

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY (2020)

Article Engineering, Electrical & Electronic

EdgeDRNN: Recurrent Neural Network Accelerator for Edge Inference

Chang Gao et al.

IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS (2020)

Article Computer Science, Information Systems

Approximate LSTM Computing for Energy-Efficient Speech Recognition

Junseo Jo et al.

ELECTRONICS (2020)

Proceedings Paper Computer Science, Hardware & Architecture

Beyond Peak Performance: Comparing the Real Performance of AI-Optimized FPGAs and GPUs

Andrew Boutros et al.

2020 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT 2020) (2020)

Proceedings Paper Computer Science, Hardware & Architecture

Achieving Full Parallelism in LSTM via a Unified Accelerator Design

Xinyi Zhang et al.

2020 IEEE 38TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2020) (2020)

Article Engineering, Electrical & Electronic

E-LSTM: An Efficient Hardware Architecture for Long Short-Term Memory

Meiqi Wang et al.

IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS (2019)

Proceedings Paper Computer Science, Theory & Methods

Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity

Shijie Cao et al.

PROCEEDINGS OF THE 2019 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'19) (2019)

Proceedings Paper Computer Science, Hardware & Architecture

Why Compete When You Can Work Together: FPGA-ASIC Integration for Persistent RNNs

Eriko Nurvitadhi et al.

2019 27TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM) (2019)

Article Computer Science, Artificial Intelligence

Sequence classification for credit-card fraud detection

Johannes Jurgovsky et al.

EXPERT SYSTEMS WITH APPLICATIONS (2018)

Article Computer Science, Artificial Intelligence

LSTM: A Search Space Odyssey

Klaus Greff et al.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2017)

Proceedings Paper Engineering, Electrical & Electronic

DNPU: An 8.1TOPS/W Reconfigurable CNN-RNN Processor for General-Purpose Deep Neural Networks

Dongjoo Shin et al.

2017 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE (ISSCC) (2017)

Proceedings Paper Computer Science, Artificial Intelligence

Channel Pruning for Accelerating Very Deep Neural Networks

Yihui He et al.

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) (2017)

Proceedings Paper Computer Science, Hardware & Architecture

ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA

Song Han et al.

FPGA'17: PROCEEDINGS OF THE 2017 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (2017)

Article Computer Science, Artificial Intelligence

A Novel Connectionist System for Unconstrained Handwriting Recognition

Alex Graves et al.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2009)