4.6 Article

Scheduling Spark Tasks With Data Skew and Deadline Constraints

期刊

IEEE ACCESS
卷 9, 期 -, 页码 2793-2804

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2020.3040719

关键词

Data skew; spark; scheduling optimization; cloud computing

资金

  1. National Key Research and Development Program of China [2018YFB1402500]
  2. National Natural Science Foundation of China [61872077, 61832004]
  3. National Hi-Tech Project [315055101]
  4. project of advanced research of the leading professional teachers in Higher Vocational Colleges in Jiangsu Province [2019GRFX078]
  5. Collaborative Innovation Center of Wireless Communications Technology

向作者/读者索取更多资源

This paper investigates Spark task scheduling with data skew and deadline constraints, and proposes an optimized algorithm which outperforms existing algorithms in big data processing performance based on experimental results.
Data skew has an essential impact on the performance of big data processing. Spark task scheduling with data skew and deadline constraints is considered to minimize the total rental cost in this paper. A modified scheduling architecture is developed in terms of the unique characteristics of the considered problem. A mathematical model is constructed, and a Spark task scheduling algorithm is proposed considering both the data skew and deadline constraints. The algorithm consists of three components: stage sequencing, task scheduling, and scheduling adjustment. Strategies for each of the components are presented. The parameters and components of the proposed algorithm are calibrated over many random instances. The calibrated algorithm is compared to two existing algorithms for similar problems over classical scientific workflow applications. Experimental results show that the proposed algorithm outperforms the compared algorithms statistically.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据