☆ 4.5 Article

Forseti: Dynamic chunk-level reshaping for data processing on heterogeneous clusters

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2023)

期刊

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING

卷 171, 期 -, 页码 14-23

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.jpdc.2022.09.003

关键词

Distributed computing; Scheduling

类别

Computer Science, Theory & Methods

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Data-intensive computing frameworks divide job workload into fixed-size chunks for parallel processing on distributed machines. However, the variability and uncertainty in processing time can lead to performance degradation. This paper proposes Forseti, a processing scheme that dynamically adjusts data chunk size based on machine heterogeneity and dynamic execution environment. Forseti also utilizes virtual machine reuse to reduce startup and initialization costs. Experimental results demonstrate significant performance improvements of Forseti compared to other baselines in terms of job completion time.

Data-intensive computing frameworks typically split job workload into fixed-size chunks, allowing them to be processed as parallel tasks on distributed machines. Ideally, when the machines are homogeneous and have identical speed, chunks of equal size would finish processing at the same time. However, such determinism in processing time cannot be guaranteed in practice. Diverging processing times can result from various sources such as system dynamics, machine heterogeneity, and variable network conditions. Such variation, together with dynamics and uncertainty during task processing, can lead to significant performance degradation at job level, due to long tails in job completion time resulted from residual chunk workload and stragglers. In this paper, we propose Forseti, a novel processing scheme that is able to reshape data chunk size on the fly with respect to heterogeneous machines and a dynamic execution environment. Forseti mitigates residual workload and stragglers to achieve significant improvement in performance. We note that Forseti is a fully online scheme and does not require any a priori knowledge of the machine configuration nor job statistics. Instead, it infers such information and adjusts data chunk sizes at runtime, making the solution robust even in environments with high volatility. In its implementation, Forseti also exploits a virtual machine reuse feature to avoid task start-up and initialization cost associated with launching new tasks. We prototype Forseti on a real-world cluster and evaluate its performance using several realistic benchmarks. The results show that Forseti outperforms a number of baselines, including default Hadoop by up to 68% and SkewTune by up to 50% in terms of average job completion time. (c) 2022 Elsevier Inc. All rights reserved.

Forseti: Dynamic chunk-level reshaping for data processing on heterogeneous clusters

期刊

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Forseti: Dynamic chunk-level reshaping for data processing on heterogeneous clusters

期刊

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文