4.3 Article

A bandwidth enhancement method of VTA based on paralleled memory access design

Journal

INTEGRATION-THE VLSI JOURNAL
Volume 94, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.vlsi.2023.102102

Keywords

DNN accelerator; VTA; FPGA; Paralleled memory access

Ask authors/readers for more resources

This study proposes an enhanced VTA architecture, which achieves parallel loading of feature maps and weight data by redesigning and optimizing the VTA memory access microarchitecture, fully utilizing bandwidth resources. The experimental results demonstrate significant performance improvement of the proposed architecture, with higher speedup and power efficiency compared to other full-stack accelerator designs.
With operator-level optimized complier and accelerator back-ends, TVM-VTA stack is a reconfigurable, hard-ware/software collaborative tensor accelerator. However, VTA architecture based on HLS compilation could not make full use of hardware resources, there is still much room for optimization. Specifically, the hardware throughput cannot utilize the memory bandwidth well, which leads to the performance bottleneck. Therefore, we propose Enhanced VTA of paralleled channel through RTL-HLS hybrid templates, which is compatible with the full stack framework. The VTA memory access microarchitecture is redesigned and optimized by combining the hardware platform resources, to realize the paralleled loading of feature map and weight data with bandwidth resources fully used. Based on Xilinx ZCU104 development board, the software and hardware working envi-ronment is built, network of YOLOV3-Tiny, YOLOV3 are deployed. The peak computing power can reach up to 361GOP/s at frequency of 200 MHz, which is 99.64x than original VTA on PYNQ Z1 platform. The performance of YOLOV3-Tiny reaches the highest compared with public results on TVM community. The overall performance of YOLOV3 based on TVM-VTA is proposed for the first time, the normalized operation is 2.2x speedup of NVDLA. The performance of speedup and power efficiency have advantages among different designs of full-stack accelerators.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available