期刊
INTEGRATION-THE VLSI JOURNAL
卷 81, 期 -, 页码 268-279出版社
ELSEVIER
DOI: 10.1016/j.vlsi.2021.08.001
关键词
Approximate computing; Approximate multiplier; Hardware accelerator; Edge computing; Matrix multiplication
资金
- Ministry of Electronics and Information Technology (MeitY), Government of India [MEITY-PHD-2145]
Approximate computing is an efficient design methodology that allows a slight loss in output accuracy to improve the performance and power-efficiency of digital systems. The proposed approximate radix-4 Booth multiplier and hardware accelerator demonstrated significant improvements in power consumption and performance for deep learning applications on power-restricted devices. Experimental results showed a reduction in power consumption by 34% and 40% for matrix-vector multiplication (MVM) and matrix-matrix multiplication (MMM) workloads, along with a substantial increase in performance compared to conventional designs.
Approximate computing has emerged as an efficient design methodology for improving the performance and power-efficiency of digital systems by allowing a negligible loss in the output accuracy. Dedicated hardware accelerators built using approximate circuits can solve power-performance trade-off in the computationally complex applications like deep learning. This paper proposes an approximate radix-4 Booth multiplier and hardware accelerator for deploying deep learning applications on power-restricted mobile/edge computing devices. The proposed accelerator uses approximate multiplier based parallel processing elements to accelerate the workloads. The proposed accelerator is tested with matrix-vector multiplication (MVM) and matrix-matrix multiplication (MMM) workloads on Zynq ZCU102 evaluation board. The experimental results show that the average power consumption of the proposed accelerator reduces by 34% and 40% for MVM and MMM respectively, as compared to the conventional multiply-accumulate unit that was used in the literature to implement similar workloads. Moreover, the proposed accelerator achieved an average performance of 5 GOP/s and 42.5 GOP/s for MVM and MMM respectively at 275 MHz, which are 14x and 5x respective improvements over the conventional design.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据