4.6 Article

A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image Classification

相关参考文献

注意:仅列出部分参考文献,下载原文获取全部文献信息。
Article Computer Science, Theory & Methods

A Survey on Aspect-Based Sentiment Classification

Gianni Brauwers et al.

Summary: This survey reviews the current state of research on aspect-based sentiment classification (ABSC). A novel taxonomy categorizing ABSC models into three major categories is proposed, and summaries of reported model performances are provided. The paper discusses state-of-the-art ABSC models such as transformer-based models and hybrid deep learning models. Various techniques for representing model inputs and evaluating model outputs are reviewed. The paper also identifies trends in ABSC research and provides a discussion on future advancements.

ACM COMPUTING SURVEYS (2023)

Article Computer Science, Artificial Intelligence

Deploying deep learning networks based advanced techniques for image processing on FPGA platform

Refka Ghodhbani et al.

Summary: Convolutional neural networks (CNN) have become dominant in various fields, but their high computation and memory requirements pose challenges. Low-precision representations of neurons and inputs can offer scalability and efficiency while sacrificing accuracy. Recent studies have shown that high accuracy can still be achieved even with binary values, and this paper reviews works that explore design space and automate the building of customizable inference engines for image processing on FPGAs.

NEURAL COMPUTING & APPLICATIONS (2023)

Review Computer Science, Interdisciplinary Applications

Model Compression for Deep Neural Networks: A Survey

Zhuo Li et al.

Summary: With the rapid development of deep learning, deep neural networks (DNNs) have been widely applied in computer vision tasks. However, advanced DNN models have become complex, leading to high memory usage and computation demands. To address these issues, model compression has become a research focus. This study analyzed various model compression methods to reduce device storage space, speed up model inference, and improve model deployment.

COMPUTERS (2023)

Article Computer Science, Information Systems

A Survey on Efficient Convolutional Neural Networks and Hardware Acceleration

Deepak Ghimire et al.

Summary: This review examines the remarkable performance of deep-learning-based representations in academia and industry over the past decade. It focuses on improving the efficiency of deep learning research through quantized/binarized models, optimized architectures, and resource-constrained systems. The review also discusses practical applications of efficient CNNs using various hardware architectures and platforms.

ELECTRONICS (2022)

Article Computer Science, Artificial Intelligence

Optimization-Based Post-Training Quantization With Bit-Split and Stitching

Peisong Wang et al.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2022)

Article Computer Science, Hardware & Architecture

Quantized Sparse Training: A Unified Trainable Framework for Joint Pruning and Quantization in DNNs

Jun-Hyung Park et al.

Summary: In this study, a compression framework called quantized sparse training is proposed, which prunes and quantizes networks simultaneously in a unified training process. Empirical results show that the proposed methodology outperforms the state-of-the-art baselines in terms of both model size and accuracy.

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Explicit Model Size Control and Relaxation via Smooth Regularization for Mixed-Precision Quantization

Vladimir Chikin et al.

Summary: While deep neural network quantization reduces computational and storage costs, it also leads to a drop in model accuracy. To overcome this, using different quantization bit-widths for different layers is a possible solution. In this study, a novel technique for explicit complexity control of mixed-precision quantized DNNs is introduced, which utilizes smooth optimization and can be applied to any neural network architecture.

COMPUTER VISION, ECCV 2022, PT XII (2022)

Proceedings Paper Computer Science, Artificial Intelligence

BASQ: Branch-wise Activation-clipping Search Quantization for Sub-4-bit Neural Networks

Han-Byul Kim et al.

Summary: In this paper, a new method for low-bit activation quantization called BASQ is proposed. A novel block structure suitable for both MobileNet and ResNet structures is also introduced. The proposed method offers competitiveness across various low precisions, outperforming existing methods in terms of accuracy on ImageNet.

COMPUTER VISION, ECCV 2022, PT XII (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation

Zechun Liu et al.

Summary: This study proposes a Nonuniform-to-Uniform Quantization (N2UQ) method that maintains the strong representation ability of nonuniform methods while being hardware-friendly and efficient. By learning flexible inequidistant input thresholds and quantizing real-valued inputs into equidistant output levels, the N2UQ achieves its objective.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) (2022)

Article Computer Science, Artificial Intelligence

Pruning and quantization for deep neural network acceleration: A survey

Tailin Liang et al.

Summary: Deep neural networks have been widely used in computer vision applications, but their complex architectures pose challenges in real-time deployment due to high computation resources and energy costs. Network compression techniques such as pruning and quantization can help overcome these challenges by reducing redundant computations. Both techniques can be used independently or together to improve the efficiency and performance of deep neural networks.

NEUROCOMPUTING (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Layer Importance Estimation with Imprinting for Neural Network Quantization

Hongyang Liu et al.

Summary: Neural network quantization achieves high compression through fixed low bit-width representation, but mixed precision quantization requires careful tuning. Our method introduces an accuracy-aware criterion for layer importance and implements imprinting per layer for more interpretable bit-width configuration.

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021 (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Adaptive Binary-Ternary Quantization

Ryan Razani et al.

Summary: Neural network models are resource intensive and difficult to deploy on devices with limited resources. Low bit quantization such as binary and ternary quantization can alleviate this issue, with ternary quantization being more accurate. Mixed quantized models allow a trade-off between accuracy and memory footprint.

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021 (2021)

Article Computer Science, Artificial Intelligence

Bi-Real Net: Binarizing Deep Network Towards Real-Network Performance

Zechun Liu et al.

INTERNATIONAL JOURNAL OF COMPUTER VISION (2020)

Article Computer Science, Hardware & Architecture

Grow and Prune Compact, Fast, and Accurate LSTMs

Xiaoliang Dai et al.

IEEE TRANSACTIONS ON COMPUTERS (2020)

Article Computer Science, Artificial Intelligence

Binary neural networks: A survey

Haotong Qin et al.

PATTERN RECOGNITION (2020)

Article Engineering, Electrical & Electronic

Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey

Lei Deng et al.

PROCEEDINGS OF THE IEEE (2020)

Article Computer Science, Hardware & Architecture

TiM-DNN: Ternary In-Memory Accelerator for Deep Neural Networks

Shubham Jain et al.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS (2020)

Article Engineering, Electrical & Electronic

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices

Yu-Hsin Chen et al.

IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS (2019)

Article Computer Science, Theory & Methods

BSHIFT: A Low Cost Deep Neural Networks Accelerator

Yong Yu et al.

INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING (2019)

Article Mathematics

Blended coarse gradient descent for full quantization of deep neural networks

Penghang Yin et al.

RESEARCH IN THE MATHEMATICAL SCIENCES (2019)

Article Engineering, Electrical & Electronic

Model Compression and Acceleration for Deep Neural Networks The principles, progress, and challenges

Yu Cheng et al.

IEEE SIGNAL PROCESSING MAGAZINE (2018)

Article Computer Science, Hardware & Architecture

Weighted Quantization-Regularization in DNNs for Weight Memory Minimization Toward HW Implementation

Matthias Wess et al.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS (2018)

Review Computer Science, Information Systems

Recent advances in efficient computation of deep convolutional neural networks

Jian Cheng et al.

FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING (2018)

Proceedings Paper Computer Science, Artificial Intelligence

SpWA: An Efficient Sparse Winograd Convolutional Neural Networks Accelerator on FPGAs

Liqiang Lu et al.

2018 55TH ACM/ESDA/IEEE DESIGN AUTOMATION CONFERENCE (DAC) (2018)

Proceedings Paper Computer Science, Artificial Intelligence

Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks

Aojun Zhou et al.

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2018)

Article Computer Science, Artificial Intelligence

BinaryRelax: A Relaxation Approach for Training Deep Neural Networks with Quantized Weights

Penghang Yin et al.

SIAM JOURNAL ON IMAGING SCIENCES (2018)

Article Computer Science, Hardware & Architecture

Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks

Shu-Chang Zhou et al.

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY (2017)

Article Engineering, Electrical & Electronic

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

Vivienne Sze et al.

PROCEEDINGS OF THE IEEE (2017)

Article Computer Science, Hardware & Architecture

Structured Pruning of Deep Convolutional Neural Networks

Sajid Anwar et al.

ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS (2017)

Proceedings Paper Computer Science, Artificial Intelligence

Densely Connected Convolutional Networks

Gao Huang et al.

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) (2017)

Proceedings Paper Computer Science, Artificial Intelligence

Channel Pruning for Accelerating Very Deep Neural Networks

Yihui He et al.

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) (2017)

Proceedings Paper Computer Science, Artificial Intelligence

Deep Learning with Low Precision by Half-wave Gaussian Quantization

Zhaowei Cai et al.

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) (2017)

Article Computer Science, Artificial Intelligence

Accelerating Very Deep Convolutional Networks for Classification and Detection

Xiangyu Zhang et al.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2016)

Article Computer Science, Artificial Intelligence

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky et al.

INTERNATIONAL JOURNAL OF COMPUTER VISION (2015)

Article Computer Science, Information Systems

Channel-Level Acceleration of Deep Face Representations

Adam Polyak et al.

IEEE ACCESS (2015)

Proceedings Paper Computer Science, Hardware & Architecture

ShiDianNao: Shifting Vision Processing Closer to the Sensor

Zidong Du et al.

2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA) (2015)

Proceedings Paper Computer Science, Hardware & Architecture

DaDianNao: A Machine-Learning Supercomputer

Yunji Chen et al.

2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO) (2014)

Article Engineering, Electrical & Electronic

Deep Neural Networks for Acoustic Modeling in Speech Recognition

Geoffrey Hinton et al.

IEEE SIGNAL PROCESSING MAGAZINE (2012)