Towards Efficient In-Memory Computing Hardware for Quantized Neural Networks: State-of-the-Art, Open Challenges and Perspectives

Article Computer Science, Hardware & Architecture

FAT: An In-Memory Accelerator With Fast Addition for Ternary Weight Neural Networks

Shien Zhu et al.

Summary: This article proposes FAT as a novel IMC accelerator for TWNs, which achieves improved acceleration by utilizing sparsity and a fast addition scheme.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS (2023)

添加到收藏夹

Article Engineering, Electrical & Electronic

Two-Way Transpose Multibit 6T SRAM Computing-in-Memory Macro for Inference-Training AI Edge Chips

Jian-Wei Su et al.

Summary: This article introduces a promising approach called Computing-in-Memory (CIM) based on SRAM for achieving energy-efficient multiply-and-accumulate (MAC) operations in AI edge devices. By utilizing a two-way transpose (TWT) multiply cell and a novel read scheme, the CIM macro was able to achieve high resistance to process variation and energy efficiency in performing MAC operations with inputs, weights, and outputs of various bit lengths.

IEEE JOURNAL OF SOLID-STATE CIRCUITS (2022)

添加到收藏夹

Article Engineering, Electrical & Electronic

An Embedded NAND Flash-Based Compute-In-Memory Array Demonstrated in a Standard Logic Process

Minsu Kim et al.

Summary: Inspired by the 3D NAND flash array structure, a neural network hardware with high recognition accuracy and low current variation was experimentally demonstrated.

IEEE JOURNAL OF SOLID-STATE CIRCUITS (2022)

添加到收藏夹

Article Computer Science, Hardware & Architecture

CMQ: Crossbar-Aware Neural Network Mixed-Precision Quantization via Differentiable Architecture Search

Jie Peng et al.

Summary: This study proposed a crossbar-aware mixed-precision quantization scheme to improve the accuracy and robustness of neural networks. By dynamically adjusting group size and conducting a detailed precision search flow, the method showed significant improvement in inference accuracy and resource savings. Additionally, experimental results demonstrated that the mixed-precision network with noise adaption training is more robust to noise compared to fixed-precision networks.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

NAX: Neural Architecture and Memristive Xbar based Accelerator Co-design

Shubham Negi et al.

Summary: The integration of neural architecture search (NAS) with memristive crossbar array (MCA) based in-memory computing (IMC) accelerator is an open problem. In this study, we propose NAX - an efficient NAS engine that co-designs neural network and IMC based hardware architecture to achieve optimal tradeoffs between hardware efficiency and application accuracy.

PROCEEDINGS OF THE 59TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2022 (2022)

添加到收藏夹

Proceedings Paper Engineering, Electrical & Electronic

Towards Efficient RRAM-based Quantized Neural Networks Hardware: State-of-the-art and Open Issues

O. Krestinskaya et al.

Summary: This paper provides a comprehensive analysis of state-of-the-art RRAM-based QNN implementations, showing the position of RRAMs in terms of efficient QNN hardware. It covers hardware and device challenges related to QNNs and discusses the main unsolved issues and possible future research directions.

2022 IEEE 22ND INTERNATIONAL CONFERENCE ON NANOTECHNOLOGY (NANO) (2022)

添加到收藏夹

Article Engineering, Electrical & Electronic

Inference Dropouts in Binary Weighted Analog Memristive Crossbar

Alex James et al.

Summary: Stochastic dropouts and weight binarization in inference stages improve the energy efficiency and robustness of memristive crossbar accelerators for building reliable edge AI computing devices.

IEEE TRANSACTIONS ON NANOTECHNOLOGY (2022)

添加到收藏夹

Article Computer Science, Hardware & Architecture

SIAM: Chiplet-based Scalable In-Memory Acceleration with Mesh for Deep Neural Networks

Gokul Krishnan et al.

Summary: This study introduces a new benchmarking simulator, SIAM, to evaluate the performance of chiplet-based IMC architectures and explore the potential of this paradigm shift in IMC architecture design.

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS (2021)

添加到收藏夹

Article Engineering, Electrical & Electronic

PCM-Based Analog Compute-In-Memory: Impact of Device Non-Idealities on Inference Accuracy

X. Sun et al.

Summary: The study investigates the impact of phase change memory (PCM) device non-idealities on deep neural network (DNN) inference accuracy. Nonlinear I-V, resistance variation, read noise, and resistance drift are identified as important factors affecting accuracy. Methods such as temperature-specific weight remapping, variation-aware training, and weight transfusion are proposed to mitigate accuracy degradation caused by non-idealities. Additional area for storing pre-trained weights is identified as the main overhead for implementing the weight transfusion method.

IEEE TRANSACTIONS ON ELECTRON DEVICES (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Mixed-precision quantized neural networks with progressively decreasing bitwidth

Tianshu Chu et al.

Summary: Efficient model inference is crucial in deploying deep neural networks on resource constraint platforms, and network quantization effectively addresses this issue by utilizing low-bit representation. By assigning progressively decreasing bitwidth to different layers, a mixed-precision quantized neural network can achieve a better trade-off between accuracy and compression.

PATTERN RECOGNITION (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

A Learning Framework for n-Bit Quantized Neural Networks Toward FPGAs

Jun Chen et al.

Summary: This article introduces a novel learning framework for n-bit QNNs with weights constrained to powers of two, proposing a reconstructed gradient function to address gradient vanishing issues. The work also presents a new QNN structure named n-BQ-NN, utilizing shift operations instead of multiply operations, and introduces a shift vector processing element (SVPE) array for improved efficiency on FPGAs. Experimental results demonstrate the effectiveness of the framework, achieving comparable accuracies with original full-precision models and outperforming typical low-precision QNNs in terms of speed and energy consumption.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2021)

添加到收藏夹

Article Engineering, Electrical & Electronic

Layer-Specific Optimization for Mixed Data Flow With Mixed Precision in FPGA Design for CNN-Based Object Detectors

Duy Thanh Nguyen et al.

Summary: This paper proposes a layer-specific hardware optimization scheme for CNNs, utilizing mixed data flow and mixed precision to reduce off-chip access and model size significantly while maintaining accuracy. Bayesian optimization is used to select the optimal sparsity for each layer, achieving a balanced trade-off between accuracy and compression.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Exploiting Retraining-Based Mixed-Precision Quantization for Low-Cost DNN Accelerator Design

Nahsung Kim et al.

Summary: This article presents a retraining-based mixed-precision quantization approach and a customized DNN accelerator for high energy efficiency. By assigning additional bits to weights showing frequent switching and mitigating gradient noise with a lower learning rate, the proposed quantization achieves better compression ratio and energy savings compared to existing methods. The experimental results show improved accuracy and energy efficiency for VGG-9 model on CIFAR-10 dataset using the proposed quantization method and accelerator.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2021)

添加到收藏夹

Proceedings Paper Computer Science, Hardware & Architecture

Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators

Sitao Huang et al.

Summary: A mixed precision quantization scheme for ReRAM-based DNN inference accelerators was proposed in this study, reducing inference latency and energy consumption significantly while only losing a small amount of accuracy. It jointly applies weight quantization, input quantization, and partial sum quantization for each DNN layer, and includes an automated quantization flow powered by deep reinforcement learning to search for the optimal configuration.

2021 26TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC) (2021)

添加到收藏夹

Proceedings Paper Engineering, Electrical & Electronic