4.3 Article

Quantization aware approximate multiplier and hardware accelerator for edge computing of deep learning applications

期刊

INTEGRATION-THE VLSI JOURNAL
卷 81, 期 -, 页码 268-279

出版社

ELSEVIER
DOI: 10.1016/j.vlsi.2021.08.001

关键词

Approximate computing; Approximate multiplier; Hardware accelerator; Edge computing; Matrix multiplication

资金

  1. Ministry of Electronics and Information Technology (MeitY), Government of India [MEITY-PHD-2145]

向作者/读者索取更多资源

Approximate computing is an efficient design methodology that allows a slight loss in output accuracy to improve the performance and power-efficiency of digital systems. The proposed approximate radix-4 Booth multiplier and hardware accelerator demonstrated significant improvements in power consumption and performance for deep learning applications on power-restricted devices. Experimental results showed a reduction in power consumption by 34% and 40% for matrix-vector multiplication (MVM) and matrix-matrix multiplication (MMM) workloads, along with a substantial increase in performance compared to conventional designs.
Approximate computing has emerged as an efficient design methodology for improving the performance and power-efficiency of digital systems by allowing a negligible loss in the output accuracy. Dedicated hardware accelerators built using approximate circuits can solve power-performance trade-off in the computationally complex applications like deep learning. This paper proposes an approximate radix-4 Booth multiplier and hardware accelerator for deploying deep learning applications on power-restricted mobile/edge computing devices. The proposed accelerator uses approximate multiplier based parallel processing elements to accelerate the workloads. The proposed accelerator is tested with matrix-vector multiplication (MVM) and matrix-matrix multiplication (MMM) workloads on Zynq ZCU102 evaluation board. The experimental results show that the average power consumption of the proposed accelerator reduces by 34% and 40% for MVM and MMM respectively, as compared to the conventional multiply-accumulate unit that was used in the literature to implement similar workloads. Moreover, the proposed accelerator achieved an average performance of 5 GOP/s and 42.5 GOP/s for MVM and MMM respectively at 275 MHz, which are 14x and 5x respective improvements over the conventional design.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据