☆ 4.3 Article

Quantization aware approximate multiplier and hardware accelerator for edge computing of deep learning applications

INTEGRATION-THE VLSI JOURNAL (2021)

期刊

INTEGRATION-THE VLSI JOURNAL

卷 81, 期 -, 页码 268-279

出版社

ELSEVIER

DOI: 10.1016/j.vlsi.2021.08.001

关键词

Approximate computing; Approximate multiplier; Hardware accelerator; Edge computing; Matrix multiplication

类别

Computer Science, Hardware & Architecture Engineering, Electrical & Electronic

资金

Ministry of Electronics and Information Technology (MeitY), Government of India [MEITY-PHD-2145]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Approximate computing is an efficient design methodology that allows a slight loss in output accuracy to improve the performance and power-efficiency of digital systems. The proposed approximate radix-4 Booth multiplier and hardware accelerator demonstrated significant improvements in power consumption and performance for deep learning applications on power-restricted devices. Experimental results showed a reduction in power consumption by 34% and 40% for matrix-vector multiplication (MVM) and matrix-matrix multiplication (MMM) workloads, along with a substantial increase in performance compared to conventional designs.

Approximate computing has emerged as an efficient design methodology for improving the performance and power-efficiency of digital systems by allowing a negligible loss in the output accuracy. Dedicated hardware accelerators built using approximate circuits can solve power-performance trade-off in the computationally complex applications like deep learning. This paper proposes an approximate radix-4 Booth multiplier and hardware accelerator for deploying deep learning applications on power-restricted mobile/edge computing devices. The proposed accelerator uses approximate multiplier based parallel processing elements to accelerate the workloads. The proposed accelerator is tested with matrix-vector multiplication (MVM) and matrix-matrix multiplication (MMM) workloads on Zynq ZCU102 evaluation board. The experimental results show that the average power consumption of the proposed accelerator reduces by 34% and 40% for MVM and MMM respectively, as compared to the conventional multiply-accumulate unit that was used in the literature to implement similar workloads. Moreover, the proposed accelerator achieved an average performance of 5 GOP/s and 42.5 GOP/s for MVM and MMM respectively at 275 MHz, which are 14x and 5x respective improvements over the conventional design.

Quantization aware approximate multiplier and hardware accelerator for edge computing of deep learning applications

期刊

INTEGRATION-THE VLSI JOURNAL

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Quantization aware approximate multiplier and hardware accelerator for edge computing of deep learning applications

期刊

INTEGRATION-THE VLSI JOURNAL

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文