4.6 Article

A Low-Cost Fully Integer-Based CNN Accelerator on FPGA for Real-Time Traffic Sign Recognition

期刊

IEEE ACCESS
卷 10, 期 -, 页码 84626-84634

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2022.3197906

关键词

Traffic sign recognition; CNN; quantization; accelerator; FPGA

资金

  1. National Research Foundation of Korea (NRF) - Korea government (MSIT, Ministry of Science and ICT) [2022R1G1A1007415]
  2. MSIT, Korea, under the Information Technology Research Center (ITRC) support program [IITP-2021-0-02052]
  3. MSIT
  4. National IT Industry Promotion Agency (NIPA)
  5. National Research Foundation of Korea [2022R1G1A1007415] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

向作者/读者索取更多资源

This paper proposes a low-cost CNN-based real-time TSR hardware accelerator, which reduces computational complexity by extending a hardware-friendly quantization method and balances real-time inference and resource consumption through two parallelization strategies. Experimental results show that the proposed scheme has good performance on embedded platforms.
Traffic sign recognition (TSR) technology allows the vehicle to recognize road signs through a camera and use it for driving. For traffic safety, TSR is one of the core technologies constituting advanced driver assistance systems (ADAS), and several researches have been studied. The advent of convolutional neural networks (CNNs) has opened up new possibilities in automotive environments, especially for ADAS. However, deploying a real-time TSR application in resource-constrained ADAS is challenging because most CNNs require high computing resources and memory usage. To address this problem, some works have been studied to consider optimization in embedded platforms, but existing works used many hardware resources or showed low computation performance. In this paper, we propose a low-cost CNN-based real-time TSR hardware accelerator. Firstly, we extend a novel hardware-friendly quantization method to reduce computational complexity. The quantization method can reconstruct the CNN so that all operations, including the skip connection path of residual blocks, use only integer arithmetic and reduce the computational overhead by replacing the quantization affine mapping process with a shift operation. Secondly, the proposed hardware accelerator applied two parallelization strategies to balance real-time inference and resource consumption. In addition, we present a simple and effective hardware design scheme that handles the skip connection path of residual blocks. This design scheme can optimize the dataflow of the skip connection path and reduce additional internal memory usage. Experimental results show that the reconstructed fully integer-based CNN only requires 24M integer operations (I0Ps) and possesses a model size of 0.17MB. Compared with the previous work, the proposed CNN model size was reduced by x105, and the number of operations was reduced by x 58. In addition, the proposed CNN can achieve a TSR accuracy of 99.07%, which is the highest accuracy among CNN-based TSR works implemented on embedded platforms. The proposed hardware accelerator achieves a computation performance of 960 MOPS and a frame rate of 40 FPS when implemented on a Xilinx ZC706 SoC. Consequently, this work improves by x 11.87 and x36.7 on computation performance and frame rate compared to the previous work.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据