4.3 Article

Hardware designs for convolutional neural networks: Memoryful, memoryless and cached

Journal

INTEGRATION-THE VLSI JOURNAL
Volume 94, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.vlsi.2023.102074

Keywords

Convolutional Neural Network (CNN); Hardware Neural Network (HNN); LeNet-5; MNIST; FPGA; Hardware architecture for machine learning; Reconfigurable computing; Parallelization; Low-power

Ask authors/readers for more resources

The memoryful architecture reduces processing time by using more memory; the memoryless architecture avoids memory usage but compromises processing time; the cache memory-based architecture strikes a balance between memory usage and processing performance.
This work presents three hardware architectures for convolutional neural networks with high degree of parallelism and component reuse implemented in a programmable device. The first design, which is termed memoryful architecture, uses as much memory as necessary to store the input data and intermediate results. The second design, which is termed memoryless architecture, defines and explores a specific input sequencing pattern to completely avoid the use of memory. The third design, which is termed cache memory-based architecture, is an intermediate solution, where the input sequence is further explored. In this case, a cache memory is used to store some intermediate results and, consequently, to improve processing performance. We compare the three designs in terms of power, area, and processing time. Allowing memory usage in the memoryful architecture increases the overall hardware cost, but reduces processing time. Preventing all memory usage in memoryless architecture increases operation parallelism, but compromises processing time. A trade-off between memory usage and processing performance is achieved in cache memory-based architecture. It provides a processing time 3x shorter than memoryful architecture, running at a clock frequency about 20% higher. When compared to the memoryless design, the cache-based architecture achieves about 13x less processing time even running at a clock frequency about 1.5% smaller. The improvement in clock frequency and processing performance comes at a cost in terms of hardware resources for the cached memory architecture. Its design, depending on the cache size, may require up to 25% more logic elements than the memoryful and memoryless architectures.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available