☆ 4.4 Article

Enhancing Reliability of Workflow Execution Using Task Replication and Spot Instances

ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS (2016)

期刊

ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS

卷 10, 期 4, 页码 -

出版社

ASSOC COMPUTING MACHINERY

DOI: 10.1145/2815624

关键词

Algorithms; Reliability; Performance; Experimentation; Fault tolerance; workflows; cloud; scheduling; spot instances; task duplication; task retry

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems Computer Science, Theory & Methods

资金

ARC (Australian Research Council)

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Cloud environments offer low-cost computing resources as a subscription-based service. These resources are elastically scalable and dynamically provisioned. Furthermore, cloud providers have also pioneered new pricing models like spot instances that are cost-effective. As a result, scientific workflows are increasingly adopting cloud computing. However, spot instances are terminated when the market price exceeds the users bid price. Likewise, cloud is not a utopian environment. Failures are inevitable in such large complex distributed systems. It is also well studied that cloud resources experience fluctuations in the delivered performance. These challenges make fault tolerance an important criterion in workflow scheduling. This article presents an adaptive, just-in-time scheduling algorithm for scientific workflows. This algorithm judiciously uses both spot and on-demand instances to reduce cost and provide fault tolerance. The proposed scheduling algorithm also consolidates resources to further minimize execution time and cost. Extensive simulations show that the proposed heuristics are fault tolerant and are effective, especially under short deadlines, providing robust schedules with minimal makespan and cost.

Enhancing Reliability of Workflow Execution Using Task Replication and Spot Instances

期刊

ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Enhancing Reliability of Workflow Execution Using Task Replication and Spot Instances

期刊

ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文