期刊
IEEE LATIN AMERICA TRANSACTIONS
卷 18, 期 5, 页码 971-982出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TLA.2020.9082927
关键词
Convolutional Neural Networks (CNN); Deep Learning; Embedded systems; Field Programmable Gate Arrays (FPGAs); Hardware accelerators; Layer Operation Chaining; Machine Learning; Single computation engine; Streaming architectures
Convolutional neural networks (CNN) have turned into one of the key algorithms in machine learning for content classification of digital images. Nevertheless, the CNN computational complexity is considerable larger than classic algorithms, thus, CPU- or GPU-based platforms are generally used for CNN implementations in many applications, but often do not fulfill portable requirements due to resources, energy and real-time constrains. Therefore, there is a growing interest on real time processing solutions for object recognition using CNNs mainly implemented on embedded systems, which are limited both in resources and energy consumption. An updated review of prominent reported approaches for mapping CNNs onto embedded systems is described in this paper. Two main solutions trends for reducing the hardware CNN workload are distinguished through a deduced taxonomy. One is focused on algorithm level solutions to reduce the number of multiplications and CNN coefficients. On the other hand, hardware level solutions goal is to achieve processing time, power consumption and hardware resources reduction. Two dominant hardware level design strategies are pointed out as oriented to either reducing the energy consumption and resources utilization meeting real-time requirements or increasing the throughput at the expense of resources utilization. Finally, two identified design strategies for CNN hardware accelerators are proposed as opportunity research areas.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据