4.4 Article

Towards Efficient In-Memory Computing Hardware for Quantized Neural Networks: State-of-the-Art, Open Challenges and Perspectives

相关参考文献

注意:仅列出部分参考文献,下载原文获取全部文献信息。
Article Computer Science, Hardware & Architecture

FAT: An In-Memory Accelerator With Fast Addition for Ternary Weight Neural Networks

Shien Zhu et al.

Summary: This article proposes FAT as a novel IMC accelerator for TWNs, which achieves improved acceleration by utilizing sparsity and a fast addition scheme.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS (2023)

Article Engineering, Electrical & Electronic

Two-Way Transpose Multibit 6T SRAM Computing-in-Memory Macro for Inference-Training AI Edge Chips

Jian-Wei Su et al.

Summary: This article introduces a promising approach called Computing-in-Memory (CIM) based on SRAM for achieving energy-efficient multiply-and-accumulate (MAC) operations in AI edge devices. By utilizing a two-way transpose (TWT) multiply cell and a novel read scheme, the CIM macro was able to achieve high resistance to process variation and energy efficiency in performing MAC operations with inputs, weights, and outputs of various bit lengths.

IEEE JOURNAL OF SOLID-STATE CIRCUITS (2022)

Article Engineering, Electrical & Electronic

An Embedded NAND Flash-Based Compute-In-Memory Array Demonstrated in a Standard Logic Process

Minsu Kim et al.

Summary: Inspired by the 3D NAND flash array structure, a neural network hardware with high recognition accuracy and low current variation was experimentally demonstrated.

IEEE JOURNAL OF SOLID-STATE CIRCUITS (2022)

Article Computer Science, Hardware & Architecture

CMQ: Crossbar-Aware Neural Network Mixed-Precision Quantization via Differentiable Architecture Search

Jie Peng et al.

Summary: This study proposed a crossbar-aware mixed-precision quantization scheme to improve the accuracy and robustness of neural networks. By dynamically adjusting group size and conducting a detailed precision search flow, the method showed significant improvement in inference accuracy and resource savings. Additionally, experimental results demonstrated that the mixed-precision network with noise adaption training is more robust to noise compared to fixed-precision networks.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS (2022)

Proceedings Paper Computer Science, Artificial Intelligence

NAX: Neural Architecture and Memristive Xbar based Accelerator Co-design

Shubham Negi et al.

Summary: The integration of neural architecture search (NAS) with memristive crossbar array (MCA) based in-memory computing (IMC) accelerator is an open problem. In this study, we propose NAX - an efficient NAS engine that co-designs neural network and IMC based hardware architecture to achieve optimal tradeoffs between hardware efficiency and application accuracy.

PROCEEDINGS OF THE 59TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2022 (2022)

Proceedings Paper Engineering, Electrical & Electronic

Towards Efficient RRAM-based Quantized Neural Networks Hardware: State-of-the-art and Open Issues

O. Krestinskaya et al.

Summary: This paper provides a comprehensive analysis of state-of-the-art RRAM-based QNN implementations, showing the position of RRAMs in terms of efficient QNN hardware. It covers hardware and device challenges related to QNNs and discusses the main unsolved issues and possible future research directions.

2022 IEEE 22ND INTERNATIONAL CONFERENCE ON NANOTECHNOLOGY (NANO) (2022)

Article Engineering, Electrical & Electronic

Inference Dropouts in Binary Weighted Analog Memristive Crossbar

Alex James et al.

Summary: Stochastic dropouts and weight binarization in inference stages improve the energy efficiency and robustness of memristive crossbar accelerators for building reliable edge AI computing devices.

IEEE TRANSACTIONS ON NANOTECHNOLOGY (2022)

Article Computer Science, Hardware & Architecture

SIAM: Chiplet-based Scalable In-Memory Acceleration with Mesh for Deep Neural Networks

Gokul Krishnan et al.

Summary: This study introduces a new benchmarking simulator, SIAM, to evaluate the performance of chiplet-based IMC architectures and explore the potential of this paradigm shift in IMC architecture design.

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS (2021)

Article Engineering, Electrical & Electronic

PCM-Based Analog Compute-In-Memory: Impact of Device Non-Idealities on Inference Accuracy

X. Sun et al.

Summary: The study investigates the impact of phase change memory (PCM) device non-idealities on deep neural network (DNN) inference accuracy. Nonlinear I-V, resistance variation, read noise, and resistance drift are identified as important factors affecting accuracy. Methods such as temperature-specific weight remapping, variation-aware training, and weight transfusion are proposed to mitigate accuracy degradation caused by non-idealities. Additional area for storing pre-trained weights is identified as the main overhead for implementing the weight transfusion method.

IEEE TRANSACTIONS ON ELECTRON DEVICES (2021)

Article Computer Science, Artificial Intelligence

Mixed-precision quantized neural networks with progressively decreasing bitwidth

Tianshu Chu et al.

Summary: Efficient model inference is crucial in deploying deep neural networks on resource constraint platforms, and network quantization effectively addresses this issue by utilizing low-bit representation. By assigning progressively decreasing bitwidth to different layers, a mixed-precision quantized neural network can achieve a better trade-off between accuracy and compression.

PATTERN RECOGNITION (2021)

Article Computer Science, Artificial Intelligence

A Learning Framework for n-Bit Quantized Neural Networks Toward FPGAs

Jun Chen et al.

Summary: This article introduces a novel learning framework for n-bit QNNs with weights constrained to powers of two, proposing a reconstructed gradient function to address gradient vanishing issues. The work also presents a new QNN structure named n-BQ-NN, utilizing shift operations instead of multiply operations, and introduces a shift vector processing element (SVPE) array for improved efficiency on FPGAs. Experimental results demonstrate the effectiveness of the framework, achieving comparable accuracies with original full-precision models and outperforming typical low-precision QNNs in terms of speed and energy consumption.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2021)

Article Engineering, Electrical & Electronic

Layer-Specific Optimization for Mixed Data Flow With Mixed Precision in FPGA Design for CNN-Based Object Detectors

Duy Thanh Nguyen et al.

Summary: This paper proposes a layer-specific hardware optimization scheme for CNNs, utilizing mixed data flow and mixed precision to reduce off-chip access and model size significantly while maintaining accuracy. Bayesian optimization is used to select the optimal sparsity for each layer, achieving a balanced trade-off between accuracy and compression.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2021)

Article Computer Science, Artificial Intelligence

Exploiting Retraining-Based Mixed-Precision Quantization for Low-Cost DNN Accelerator Design

Nahsung Kim et al.

Summary: This article presents a retraining-based mixed-precision quantization approach and a customized DNN accelerator for high energy efficiency. By assigning additional bits to weights showing frequent switching and mitigating gradient noise with a lower learning rate, the proposed quantization achieves better compression ratio and energy savings compared to existing methods. The experimental results show improved accuracy and energy efficiency for VGG-9 model on CIFAR-10 dataset using the proposed quantization method and accelerator.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2021)

Proceedings Paper Computer Science, Hardware & Architecture

Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators

Sitao Huang et al.

Summary: A mixed precision quantization scheme for ReRAM-based DNN inference accelerators was proposed in this study, reducing inference latency and energy consumption significantly while only losing a small amount of accuracy. It jointly applies weight quantization, input quantization, and partial sum quantization for each DNN layer, and includes an automated quantization flow powered by deep reinforcement learning to search for the optimal configuration.

2021 26TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC) (2021)

Proceedings Paper Engineering, Electrical & Electronic

eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded-Dynamic-Memory Array Realizing Adaptive Data Converters and Charge-Domain Computing

Shanshan Xie et al.

2021 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE (ISSCC) (2021)

Article Engineering, Electrical & Electronic

A 28-nm Compute SRAM With Bit-Serial Logic/Arithmetic Operations for Programmable In-Memory Vector Computing

Jingcheng Wang et al.

IEEE JOURNAL OF SOLID-STATE CIRCUITS (2020)

Article Multidisciplinary Sciences

Fully hardware-implemented memristor convolutional neural network

Peng Yao et al.

NATURE (2020)

Review Nanoscience & Nanotechnology

Memory devices and applications for in-memory computing

Abu Sebastian et al.

NATURE NANOTECHNOLOGY (2020)

Article Computer Science, Hardware & Architecture

PANTHER: A Programmable Architecture for Neural Network Training Harnessing Energy-Efficient ReRAM

Aayush Ankit et al.

IEEE TRANSACTIONS ON COMPUTERS (2020)

Review Engineering, Electrical & Electronic

Neuro-inspired computing chips

Wenqiang Zhang et al.

NATURE ELECTRONICS (2020)

Article Computer Science, Hardware & Architecture

ReLeQ : A Reinforcement Learning Approach for Automatic Deep Quantization of Neural Networks

Ahmed T. Elthakeb et al.

IEEE MICRO (2020)

Article Engineering, Electrical & Electronic

High-Throughput In-Memory Computing for Binary Deep Neural Networks With Monolithically Integrated RRAM and 90-nm CMOS

Shihui Yin et al.

IEEE TRANSACTIONS ON ELECTRON DEVICES (2020)

Article Engineering, Electrical & Electronic

Resistive Crossbars as Approximate Hardware Building Blocks for Machine Learning: Opportunities and Challenges

Indranil Chakraborty et al.

PROCEEDINGS OF THE IEEE (2020)

Proceedings Paper Computer Science, Artificial Intelligence

Low Power In-Memory Implementation of Ternary Neural Networks with Resistive RAM-Based Synapse

A. Laborieux et al.

2020 2ND IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2020) (2020)

Article Automation & Control Systems

Automating Analogue AI Chip Design with Genetic Search

Olga Krestinskaya et al.

ADVANCED INTELLIGENT SYSTEMS (2020)

Review Automation & Control Systems

Device and Circuit Architectures for In-Memory Computing

Daniele Ielmini et al.

ADVANCED INTELLIGENT SYSTEMS (2020)

Article Computer Science, Information Systems

IR-QNN Framework: An IR Drop-Aware Offline Training of Quantized Crossbar Arrays

Mohammed E. Fouda et al.

IEEE ACCESS (2020)

Article Computer Science, Artificial Intelligence

In situ training of feed-forward and recurrent convolutional memristor networks

Zhongrui Wang et al.

NATURE MACHINE INTELLIGENCE (2019)

Article Engineering, Electrical & Electronic

BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W

Kota Ando et al.

IEEE JOURNAL OF SOLID-STATE CIRCUITS (2018)

Article Engineering, Electrical & Electronic

A Drift-Tolerant Read/Write Scheme for Multilevel Memristor Memory

Yalcin Yilmaz et al.

IEEE TRANSACTIONS ON NANOTECHNOLOGY (2017)

Article Computer Science, Hardware & Architecture

Modeling Size Limitations of Resistive Crossbar Array With Cell Selectors

Albert Ciprut et al.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS (2017)

Article Engineering, Electrical & Electronic

Memristor-based memory: The sneak paths problem and solutions

Mohammed Affan Zidan et al.

MICROELECTRONICS JOURNAL (2013)