4.5 Article

A streaming architecture for Convolutional Neural Networks based on layer operations chaining

期刊

JOURNAL OF REAL-TIME IMAGE PROCESSING
卷 17, 期 5, 页码 1715-1733

出版社

SPRINGER HEIDELBERG
DOI: 10.1007/s11554-019-00938-y

关键词

Convolutional Neural Networks; Streaming architecture; Layer operation chaining

向作者/读者索取更多资源

Convolutional Neural Networks (CNN) have become one of the best algorithms in machine learning for content classification of digital images. The CNN computational complexity is much larger than traditional algorithms, that is why the use of Graphical Processor Units (GPU) and online servers to achieve operations acceleration is a common solution. However, there is a growing demand for real-time processing solutions in the object recognition field mainly implemented on embedded systems, which are limited both in resources and energy consumption. Recently, reported works are focused on minimizing the required resources through two design strategies. The first one is by implementing one accelerator that can be adapted to the operations of the whole CNN. The CNN architecture proposals with one accelerator for each convolution layer belong to the second design strategy, where higher performance is achieved in multiple image processing. A new design strategy is proposed in this paper, which is based on multiple accelerators using a layer operation chaining scheme for computing in parallel the operations corresponding to multiple CNN layers. Three types of parallel data processing are adopted in the proposed architecture, where the parallelism level for convolution layers is determined by defined cost-function-based algorithms. The proposed design strategy is shown by implementing three naive CNNs on a De2i-150 board, in which a peak acceleration of 18.04x was achieved in contrast with state-of-the-art design methods without layer operation chaining. Furthermore, the design results of one modified Alexnet CNN were obtained. According to the obtained results, the proposed design strategy allows to achieve a smaller processing time than that obtained by reported works using the other two design strategies. In addition, a competitive result in resources utilization is obtained for naive CNNs.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据