☆ 4.3 Article

Hardware designs for convolutional neural networks: Memoryful, memoryless and cached

INTEGRATION-THE VLSI JOURNAL (2024)

Journal

INTEGRATION-THE VLSI JOURNAL

Volume 94, Issue -, Pages -

Publisher

ELSEVIER

DOI: 10.1016/j.vlsi.2023.102074

Keywords

Convolutional Neural Network (CNN); Hardware Neural Network (HNN); LeNet-5; MNIST; FPGA; Hardware architecture for machine learning; Reconfigurable computing; Parallelization; Low-power

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The memoryful architecture reduces processing time by using more memory; the memoryless architecture avoids memory usage but compromises processing time; the cache memory-based architecture strikes a balance between memory usage and processing performance.

This work presents three hardware architectures for convolutional neural networks with high degree of parallelism and component reuse implemented in a programmable device. The first design, which is termed memoryful architecture, uses as much memory as necessary to store the input data and intermediate results. The second design, which is termed memoryless architecture, defines and explores a specific input sequencing pattern to completely avoid the use of memory. The third design, which is termed cache memory-based architecture, is an intermediate solution, where the input sequence is further explored. In this case, a cache memory is used to store some intermediate results and, consequently, to improve processing performance. We compare the three designs in terms of power, area, and processing time. Allowing memory usage in the memoryful architecture increases the overall hardware cost, but reduces processing time. Preventing all memory usage in memoryless architecture increases operation parallelism, but compromises processing time. A trade-off between memory usage and processing performance is achieved in cache memory-based architecture. It provides a processing time 3x shorter than memoryful architecture, running at a clock frequency about 20% higher. When compared to the memoryless design, the cache-based architecture achieves about 13x less processing time even running at a clock frequency about 1.5% smaller. The improvement in clock frequency and processing performance comes at a cost in terms of hardware resources for the cached memory architecture. Its design, depending on the cache size, may require up to 25% more logic elements than the memoryful and memoryless architectures.

Hardware designs for convolutional neural networks: Memoryful, memoryless and cached

Journal

INTEGRATION-THE VLSI JOURNAL

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Hardware designs for convolutional neural networks: Memoryful, memoryless and cached

Journal

INTEGRATION-THE VLSI JOURNAL

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper