4.2 Article

Row-Streaming Dataflow Using a Chaining Buffer and Systolic Array plus Structure

Journal

IEEE COMPUTER ARCHITECTURE LETTERS
Volume 20, Issue 1, Pages 34-37

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/LCA.2021.3054371

Keywords

Kernel; Integrated circuits; System-on-chip; Arrays; Two dimensional displays; Memory management; Adders; Convolutional neural network; convolutional layer; general matrix multiplication; systolic array; image-to-column; lowering; ResNet; DenseNet

Funding

  1. Samsung Advanced Institute of Technology
  2. Engineering Research Center Program through the National Research Foundation of Korea - Korean Government MSIT [NRF-2018R1A5A1059921]
  3. IC Design Education Center

Ask authors/readers for more resources

The SysAr+ structure proposed in this letter enhances data reuse in the CONV layer without the need for im2col pre-processing, resulting in significant energy consumption reduction and improved performance in ResNet and DenseNet models.
Convolutional Neural Networks (CNNs) are widely used to solve complex problems in various fields, such as image recognition, image classification, and video analysis. Convolutional (CONV) layers are the most computational part of the CNN inference; various architectures have been proposed to process it efficiently. Among those, a systolic array consists of a 2D array of processing elements, which handle GEneral Matrix Multiplication (GEMM) with high efficiency. However, to process a CONV layer as a GEMM type, image-to-column (im2col) processing, which is also called lowering, is required per layer, necessitating a larger on-chip memory and a considerable amount of repetitive on-chip memory access. In this letter, we propose a systolic array+ (SysAr+) structure augmented with a chaining buffer and a row-streaming dataflow that can maximize data reuse without the im2col pre-process in the CONV layer and the repetitive access from the large on-chip memory. By applying the proposed method to the 3x3 CONV layers, we reduce the energy consumption by up to 19.7 percent in ResNet and 37.4 percent in DenseNet with an area overhead of 1.54 percent in SysAr+, and we improve the performance by up to 32.4 percent in ResNet and 12.1 percent in DenseNet.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available