☆ 4.2 Article

MVP: An Efficient CNN Accelerator with Matrix, Vector, and Processing-Near-Memory Units

ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS (2022)

Journal

ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS

Volume 27, Issue 5, Pages -

Publisher

ASSOC COMPUTING MACHINERY

DOI: 10.1145/3497745

Keywords

Neural Networks; CNN; energy-efficient AI accelerator

Funding

Samsung Advanced Institute of Technology
Engineering Research Center Program through the National Research Foundation of Korea (NRF) - Korean Government MSIT [NRF-2018R1A5A1059921]
IC Design Education Center

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Mobile and edge devices are commonly used for inferring CNNs, but existing accelerators are not optimal for the latest CNN models, especially DW-CONV and SE models. This paper proposes a CNN acceleration architecture called MVP, which efficiently processes both compute- and memory-intensive operations with a small area overhead on top of the baseline systolic-array-based architecture.

Mobile and edge devices become common platforms for inferring convolutional neural networks (CNNs) due to superior privacy and service quality. To reduce the computational costs of convolution (CONV), recent CNN models adopt depth-wise CONV (DW-CONV) and Squeeze-and-Excitation (SE). However, existing area-efficient CNN accelerators are sub-optimal for these latest CNN models because they were mainly optimized for compute-intensive standard CONV layers with abundant data reuse that can be pipelined with activation and normalization operations. In contrast, DW-CONV and SE are memory-intensive with limited data reuse. The latter also strongly depends on the nearby CONV layers, making an effective pipelining a daunting task. Therefore, DW-CONV and SE only occupy 10% of entire operations but become memory bandwidth bound, spending more than 60% of the processing time in systolic-array-based accelerators. We propose a CNN acceleration architecture called MVP, which efficiently processes both compute- and memory-intensive operations with a small area overhead on top of the baseline systolic-array-based architecture. We suggest a specialized vector unit tailored for processing DW-CONV, including multipliers, adder trees, and multi-banked buffers to meet the high memory bandwidth requirement. We augment the unified buffer with tiny processing elements to smoothly pipeline SE with the subsequent CONV, enabling concurrent processing of DW-CONV with standard CONV, thereby achieving themaximum utilization of arithmetic units. Our evaluation shows that MVP improves performance by 2.6x and reduces energy by 47% on average for EfficientNet-B0/B4/B7, MnasNet, and MobileNet-V1/V2 with only a 9% area overhead compared to the baseline.

MVP: An Efficient CNN Accelerator with Matrix, Vector, and Processing-Near-Memory Units

Journal

ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

MVP: An Efficient CNN Accelerator with Matrix, Vector, and Processing-Near-Memory Units

Journal

ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper