4.7 Article

ETA: An Efficient Training Accelerator for DNNs Based on Hardware-Algorithm Co-Optimization

相关参考文献

注意:仅列出部分参考文献,下载原文获取全部文献信息。
Article Engineering, Electrical & Electronic

High Performance CNN Accelerators Based on Hardware and Algorithm Co-Optimization

Tian Yuan et al.

Summary: This paper introduces a hardware-oriented CNN compression strategy, which combines layers with and without pruning to achieve a balance between compression ratio and processing efficiency. Through hardware/algorithm co-optimization, a NP-P hybrid compressed CNN model was successfully implemented on FPGAs.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS (2021)

Article Engineering, Electrical & Electronic

Evolver: A Deep Learning Processor With On-Device Quantization-Voltage-Frequency Tuning

Fengbin Tu et al.

Summary: This article introduces a new deep learning processor architecture Evolver, which enables on-device QVF tuning for optimized DNN deployment. Evolver uses reinforcement learning to search for the best QVF policy and runs the newly quantized DNN inference under the searched voltage and frequency. Additionally, bidirectional speculation and runtime reconfiguration techniques are introduced for performance and energy efficiency improvement.

IEEE JOURNAL OF SOLID-STATE CIRCUITS (2021)

Article Engineering, Electrical & Electronic

SNAP: An Efficient Sparse Neural Acceleration Processor for Unstructured Sparse Deep Neural Network Inference

Jie-Fang Zhang et al.

Summary: This work introduces a sparse neural acceleration processor (SNAP) to exploit unstructured sparsity in DNNs, achieving a 75% average compute utilization through parallel associative search. SNAP follows a channel-first dataflow and uses a two-level partial sum reduction dataflow to reduce access contention and psum writeback traffic by 22x. The prototype SNAP chip, implemented in a 16-nm CMOS technology, achieves a peak effectual efficiency of 21.55 TOPS/W for CONV layers with 10% weight and activation densities.

IEEE JOURNAL OF SOLID-STATE CIRCUITS (2021)

Article Computer Science, Hardware & Architecture

Evaluations on Deep Neural Networks Training Using Posit Number System

Jinming Lu et al.

Summary: This article investigates training Deep Neural Networks (DNNs) with low-bit posit numbers, a Type-III universal number (Unum), showing the potential of the posit format in DNN training. A DNN training framework using 8-bit posit is proposed with novel tensor-wise scaling scheme, achieving the same performance as the state-of-the-art across various datasets and model architectures. Additionally, an energy-efficient hardware prototype is designed, reducing area, power, and memory capacity compared to standard floating-point counterparts.

IEEE TRANSACTIONS ON COMPUTERS (2021)

Article Computer Science, Artificial Intelligence

Training high-performance and large-scale deep neural networks with full 8-bit integers

Yukuan Yang et al.

NEURAL NETWORKS (2020)

Article Engineering, Electrical & Electronic

Federated Learning: Challenges, Methods, and Future Directions

Tian Li et al.

IEEE SIGNAL PROCESSING MAGAZINE (2020)

Article Engineering, Electrical & Electronic

An Energy-Efficient Deep Convolutional Neural Network Training Accelerator for In Situ Personalization on Smart Devices

Seungkyu Choi et al.

IEEE JOURNAL OF SOLID-STATE CIRCUITS (2020)

Proceedings Paper Computer Science, Hardware & Architecture

FPGA-based Low-Batch Training Accelerator for Modern CNNs Featuring High Bandwidth Memory

Shreyas K. Venkataramanaiah et al.

2020 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED-DESIGN (ICCAD) (2020)

Proceedings Paper Computer Science, Artificial Intelligence

Towards Unified INT8 Training for Convolutional Neural Network

Feng Zhu et al.

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2020)

Review Computer Science, Artificial Intelligence

Continual lifelong learning with neural networks: A review

German I. Parisi et al.

NEURAL NETWORKS (2019)

Proceedings Paper Computer Science, Hardware & Architecture

Automatic Compiler Based FPGA Accelerator for CNN Training

Shreyas Kolala Venkataramanaiah et al.

2019 29TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL) (2019)

Article Engineering, Electrical & Electronic

Efficient Hardware Architectures for Deep Convolutional Neural Network

Jichen Wang et al.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS (2018)

Article Engineering, Electrical & Electronic

Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks

Yu-Hsin Chen et al.

IEEE JOURNAL OF SOLID-STATE CIRCUITS (2017)

Article Engineering, Electrical & Electronic

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

Vivienne Sze et al.

PROCEEDINGS OF THE IEEE (2017)

Article Computer Science, Hardware & Architecture

New Formats for Computing with Real-Numbers under Round-to-Nearest

Javier Hormigo et al.

IEEE TRANSACTIONS ON COMPUTERS (2016)