4.6 Article

Efficient algorithms for task mapping on heterogeneous CPU/GPU platforms for fast completion time

期刊

JOURNAL OF SYSTEMS ARCHITECTURE
卷 114, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.sysarc.2020.101936

关键词

GPU; Heterogeneous scheduling; Data-size-based prediction; Neural network runtime acceleration

资金

  1. National Natural Science Foundation of China [61902169]
  2. Shenzhen Peacock Plan, China [KQTD2016112514355531]
  3. Science and Tech-nology Innovation Committee Foundation of Shenzhen, China [JCYJ20170817110848086]
  4. Climbing Project of China

向作者/读者索取更多资源

The paper introduces a theoretical framework and practical mapping algorithms for solving computation and data mapping problems in GPU-based embedded systems. Experimental results show that these algorithms can achieve faster completion times compared to state-of-the-art techniques, and perform consistently well across different workloads.
In GPU-based embedded systems, the problem of computation and data mapping for multiple applications while minimizing the completion time is quite challenging due to large size of the policy space. To achieve fast competition time, a fine-grain mapping framework that explores a set of critical factors is needed for heterogeneous embedded systems. In this paper, we present a theoretical framework that yields a sub-optimal solution via three practical mapping algorithms with low time complexity. We evaluate such algorithms upon StarPU with a large set of popular benchmarks. Experimental results demonstrate that algorithms proposed by the original EMSOFT paper can achieve up to 30% faster completion time compared to state-of-the-art mapping techniques, and can perform consistently well across different workloads. We further extend such algorithms to minimize the completion time and enhance the runtime performance of complex heterogeneous applications under resource-limited infrastructure. We also extend the evaluation by deploying StarPU under multiple setups with an additional benchmark testing suite for simulating real-world runtime neural networks. Experimental results demonstrate that our extended algorithm can achieve much faster completion time (averagely 30% to 37% under multiple resource-constraint scenarios) compared to the state-of-the-art mapping techniques.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据