4.5 Article

Breaking High-Resolution CNN Bandwidth Barriers With Enhanced Depth-First Execution

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JETCAS.2019.2905361

关键词

Neural networks; memory management; high resolution imaging; neural network hardware; Neural networks; memory management; high resolution imaging; neural network hardware

资金

  1. Research Foundation - Flanders (FWO) through the OmniDrone SBO [S003817N]

向作者/读者索取更多资源

Convolutional neural networks (CNNs) now also start to reach impressive performance on non-classification image processing tasks, such as denoising, demosaicing, super-resolution, and super slow motion. Consequently, CNNs are increasingly deployed on very high-resolution images. However, the resulting high-resolution feature maps pose unseen requirements on the memory system of neural network processing systems, as on-chip memories are too small to store high-resolution feature maps, while off-chip memories are very costly in terms of I/O bandwidth and power. This paper first shows that the classical layer-by-layer inference approaches are bounded in their external I/O bandwidth versus on-chip memory tradeoff space, making it infeasible to scale up to very high resolutions at a reasonable cost. Next, we demonstrate how an alternative depth-first network computation can reduce I/O bandwidth requirements up to >200x for a fixed on-chip memory size or, alternatively, reduce on-chip memory requirements up to >10000x for a fixed I/O bandwidth limitation. We further introduce an enhanced depth-first method, exploiting both line buffers and tiling, to further improve the external I/O bandwidth versus on-chip memory capacity tradeoff and quantify its improvements beyond the current state of the art.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据