4.3 Article

Hardware designs for convolutional neural networks: Memoryful, memoryless and cached

期刊

INTEGRATION-THE VLSI JOURNAL
卷 94, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.vlsi.2023.102074

关键词

Convolutional Neural Network (CNN); Hardware Neural Network (HNN); LeNet-5; MNIST; FPGA; Hardware architecture for machine learning; Reconfigurable computing; Parallelization; Low-power

向作者/读者索取更多资源

The memoryful architecture reduces processing time by using more memory; the memoryless architecture avoids memory usage but compromises processing time; the cache memory-based architecture strikes a balance between memory usage and processing performance.
This work presents three hardware architectures for convolutional neural networks with high degree of parallelism and component reuse implemented in a programmable device. The first design, which is termed memoryful architecture, uses as much memory as necessary to store the input data and intermediate results. The second design, which is termed memoryless architecture, defines and explores a specific input sequencing pattern to completely avoid the use of memory. The third design, which is termed cache memory-based architecture, is an intermediate solution, where the input sequence is further explored. In this case, a cache memory is used to store some intermediate results and, consequently, to improve processing performance. We compare the three designs in terms of power, area, and processing time. Allowing memory usage in the memoryful architecture increases the overall hardware cost, but reduces processing time. Preventing all memory usage in memoryless architecture increases operation parallelism, but compromises processing time. A trade-off between memory usage and processing performance is achieved in cache memory-based architecture. It provides a processing time 3x shorter than memoryful architecture, running at a clock frequency about 20% higher. When compared to the memoryless design, the cache-based architecture achieves about 13x less processing time even running at a clock frequency about 1.5% smaller. The improvement in clock frequency and processing performance comes at a cost in terms of hardware resources for the cached memory architecture. Its design, depending on the cache size, may require up to 25% more logic elements than the memoryful and memoryless architectures.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据