☆ 4.3 Article

Hardware designs for convolutional neural networks: Memoryful, memoryless and cached

INTEGRATION-THE VLSI JOURNAL (2024)

期刊

INTEGRATION-THE VLSI JOURNAL

卷 94, 期 -, 页码 -

出版社

ELSEVIER

DOI: 10.1016/j.vlsi.2023.102074

关键词

Convolutional Neural Network (CNN); Hardware Neural Network (HNN); LeNet-5; MNIST; FPGA; Hardware architecture for machine learning; Reconfigurable computing; Parallelization; Low-power

类别

Computer Science, Hardware & Architecture Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The memoryful architecture reduces processing time by using more memory; the memoryless architecture avoids memory usage but compromises processing time; the cache memory-based architecture strikes a balance between memory usage and processing performance.

This work presents three hardware architectures for convolutional neural networks with high degree of parallelism and component reuse implemented in a programmable device. The first design, which is termed memoryful architecture, uses as much memory as necessary to store the input data and intermediate results. The second design, which is termed memoryless architecture, defines and explores a specific input sequencing pattern to completely avoid the use of memory. The third design, which is termed cache memory-based architecture, is an intermediate solution, where the input sequence is further explored. In this case, a cache memory is used to store some intermediate results and, consequently, to improve processing performance. We compare the three designs in terms of power, area, and processing time. Allowing memory usage in the memoryful architecture increases the overall hardware cost, but reduces processing time. Preventing all memory usage in memoryless architecture increases operation parallelism, but compromises processing time. A trade-off between memory usage and processing performance is achieved in cache memory-based architecture. It provides a processing time 3x shorter than memoryful architecture, running at a clock frequency about 20% higher. When compared to the memoryless design, the cache-based architecture achieves about 13x less processing time even running at a clock frequency about 1.5% smaller. The improvement in clock frequency and processing performance comes at a cost in terms of hardware resources for the cached memory architecture. Its design, depending on the cache size, may require up to 25% more logic elements than the memoryful and memoryless architectures.

Hardware designs for convolutional neural networks: Memoryful, memoryless and cached

期刊

INTEGRATION-THE VLSI JOURNAL

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Hardware designs for convolutional neural networks: Memoryful, memoryless and cached

期刊

INTEGRATION-THE VLSI JOURNAL

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文