3.8 Proceedings Paper

Resource and Deadline-aware Job Scheduling in Dynamic Hadoop Clusters

向作者/读者索取更多资源

As Hadoop is becoming increasingly popular in large-scale data analysis, there is a growing need for providing predictable services to users who have strict requirements on job completion times. While earliest deadline first scheduling (EDF) like algorithms are popular in guaranteeing job deadlines in real-time systems, they are not effective in a dynamic Hadoop environment, i.e., a Hadoop cluster with dynamically available resources. As there is a growing number of Hadoop clusters deployed on hybrid systems, e.g., infrastructure powered by mix of traditional and renewable energy, and cloud platforms hosting heterogeneous workloads, variable resource availability becomes common when running Hadoop jobs. In this paper, we propose, RDS, a Resource and Deadline-aware Hadoop job Scheduler that takes future resource availability into consideration when minimizing job deadline misses. We formulate the job scheduling problem as an online optimization problem and solve it using an efficient receding horizon control algorithm. To aid the control, we design a self-learning model to estimate job completion times and use a simple but effective model to predict future resource availability. We have implemented RDS in the open-source Hadoop implementation and performed evaluations with various benchmark workloads. Experimental results show that RDS substantially reduces the penalty of deadline misses by at least 36% and 10% compared with Fair Scheduler and EDF scheduler, respectively.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据