4.4 Article

Enhancing Reliability of Workflow Execution Using Task Replication and Spot Instances

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/2815624

关键词

Algorithms; Reliability; Performance; Experimentation; Fault tolerance; workflows; cloud; scheduling; spot instances; task duplication; task retry

资金

  1. ARC (Australian Research Council)

向作者/读者索取更多资源

Cloud environments offer low-cost computing resources as a subscription-based service. These resources are elastically scalable and dynamically provisioned. Furthermore, cloud providers have also pioneered new pricing models like spot instances that are cost-effective. As a result, scientific workflows are increasingly adopting cloud computing. However, spot instances are terminated when the market price exceeds the users bid price. Likewise, cloud is not a utopian environment. Failures are inevitable in such large complex distributed systems. It is also well studied that cloud resources experience fluctuations in the delivered performance. These challenges make fault tolerance an important criterion in workflow scheduling. This article presents an adaptive, just-in-time scheduling algorithm for scientific workflows. This algorithm judiciously uses both spot and on-demand instances to reduce cost and provide fault tolerance. The proposed scheduling algorithm also consolidates resources to further minimize execution time and cost. Extensive simulations show that the proposed heuristics are fault tolerant and are effective, especially under short deadlines, providing robust schedules with minimal makespan and cost.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据