3.8 Proceedings Paper

Iris: Automatic Generation of Efficient Data Layouts for High Bandwidth Utilization

Optimizing data movements is crucial in dealing with the challenges of data deluge and big data applications in heterogeneous computing. Although modern high-level synthesis (HLS) tools are efficient in optimizing computational aspects, there is still room for improvement in data transfers. Novel architectures, such as High-Bandwidth Memory with wider data busses, have been developed to address this issue. However, designers need to tailor their hardware/software interfaces to fully utilize the available bandwidth. We propose a methodology that automates the discovery and implementation of a data layout to maximize the available bandwidth when streaming data between memory and an accelerator.
Optimizing data movements is becoming one of the biggest challenges in heterogeneous computing to cope with data deluge and, consequently, big data applications. When creating specialized accelerators, modern high-level synthesis (HLS) tools are increasingly efficient in optimizing the computational aspects, but data transfers have not been adequately improved. To combat this, novel architectures such as High-Bandwidth Memory with wider data busses have been developed so that more data can be transferred in parallel. Designers must tailor their hardware/software interfaces to fully exploit the available bandwidth. HLS tools can automate this process, but the designer must follow strict coding-style rules. If the bus width is not evenly divisible by the data width (e.g., when using custom-precision data types) or if the arrays are not power-of-two length, the HLS-generated accelerator will likely not fully utilize the available bandwidth, demanding even more manual effort from the designer. We propose a methodology to automatically find and implement a data layout that, when streamed between memory and an accelerator, uses a higher percentage of the available bandwidth than a naive or HLS-optimized design. We borrow concepts from multiprocessor scheduling to achieve such high efficiency.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available