4.6 Article

Data multiplexed and hardware reused architecture for deep neural network accelerator

Journal

NEUROCOMPUTING
Volume 486, Issue -, Pages 147-159

Publisher

ELSEVIER
DOI: 10.1016/j.neucom.2021.11.018

Keywords

Activation function; Embedded system design; Hardware reused architecture; Deep neural network; Data multiplexing; Programmable logic; Processing system

Funding

  1. University Grant Commission (UGC) New Delhi, Government of India under SRF scheme [22745/(NET-DEC. 2015)]
  2. Slovenian Research Agency [P2-0098]
  3. ECSEL Joint Undertaking [876038, 101007273]

Ask authors/readers for more resources

This paper proposes an improved DNN architecture that uses multiplexing to reuse hardware-costly activation functions and optimizes the memory bandwidth requirement. By extracting high-throughput and resource-efficient memory elements and tuning the order expansion, the test accuracy is improved. Experimental results show that the proposed architecture has advantages in reducing hardware resources and power consumption compared to other state-of-the-art implementations.
Despite many decades of research on high-performance Deep Neural Network (DNN) accelerators, their massive computational demand still requires resource-efficient, optimized and parallel architecture for computational acceleration. Contemporary hardware implementations of DNNs face the burden of excess area requirement due to resource-intensive elements such as multipliers and non-linear Activation Functions (AFs). This paper proposes DNN with reused hardware-costly AF by multiplexing data using shift-register. The on-chip quantized log2 based memory addressing with an optimized technique is used to access input features, weights, and biases. This way the external memory bandwidth requirement is reduced and dynamically adjusted for DNNs. Further, high-throughput and resource-efficient memory elements for sigmoid activation function are extracted using the Taylor series and its order expansion have been tuned for better test accuracy. The performance is validated and compared with previous works for the MNIST dataset. Besides, the digital design of AF is synthesized at 45 nm technology node and physical parameters are compared with previous works. The proposed hardware reused architecture is verified for neural network 16:16:10:4 using 8-bit dynamic fixed-point arithmetic and implemented on Xilinx Zynq xc7z010clg400 SoC using 100 MHz clock. The implemented architecture uses 25% less hardware resources and consumes 12% less power without performance loss, compared to other stateof-the-art implementations, as lower hardware resources and power consumption are especially important for increasingly important edge computing solutions.(c) 2021 The Authors. Published by Elsevier B.V.This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available