4.5 Article

Portrait: A holistic computation and bandwidth balanced performance evaluation model for heterogeneous systems

期刊

出版社

ELSEVIER
DOI: 10.1016/j.suscom.2022.100724

关键词

Accelerators; Heterogeneous systems; Bandwidth contention; Hardware hazard; PCIe

资金

  1. National Natural Science Foundation of China [61532017, 61572470, 61432017, 61521092, 61376043]
  2. Youth Innovation Promotion Association, CAS [Y404441000]

向作者/读者索取更多资源

Accelerators are widely used in various domains, but the bandwidth contention and hardware hazard in CPU-accelerator heterogeneous systems significantly bottleneck performance. To address this problem, a holistic profiling system called Portrait is proposed to model computation and bandwidth resource accurately and improve task scheduling efficiency.
Accelerators are widely used in specific domains ranging across deep learning, streaming computation and database query. To enable an accelerator, it has to be attached to a primary controller, typically, a CPU. This kind of CPU-accelerator heterogeneous systems is the mainstream of current computer systems. In CPU-accelerator heterogeneous systems, customized optimization for accelerators boosts performance and energy efficiency. However the bandwidth contention of CPU-accelerator interconnection and hardware hazard between multiple tasks on accelerators significantly bottleneck the designed performance. On one hand, limited interconnected bandwidth resource causes bandwidth contention in task offloading from CPU to accelerators. On the other hand, limited hardware resources on accelerators cause hardware hazard during task execution. To take fully advantage of designed computing power in CPU-accelerator heterogeneous system, it is necessary to mitigate these kinds of contention. However, it is hard for programmers and users to solve the contention because of the complexity of both computing tasks and system behaviors. In state-of-the-art, CPU-GPU heterogeneous system have been sufficiently studied. But CPU-FPGA heterogeneous systems are seldom comprehensive analyzed. To help to address this problem, we propose a holistic profiling system, Portrait, to help to model both computation and bandwidth resource in CPU-accelerator heterogeneous system and quantify bandwidth requirement and execution time of given tasks. The experiment shows that Portrait increases the accuracy of the bandwidth requirement up to 97.71% on average, which is 1.95x compared with the state-of-the-art. It also provides a more accurate of computation latency compared to the state-of-the-art that failed to evaluate accelerator behaviors. And it increases accuracy of task execution latency over 97.47% on average. Additionally, based on precise profiling of CPU-accelerator heterogeneous system, Portrait could help task scheduling to mitigate bandwidth contention and hardware hazard more effectively to improve system throughput.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据