☆ 4.5 Article

Coordinative Scheduling of Computation and Communication in Data-Parallel Systems

IEEE TRANSACTIONS ON COMPUTERS (2021)

Journal

IEEE TRANSACTIONS ON COMPUTERS

Volume 70, Issue 12, Pages 2182-2197

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/TC.2020.3039238

Keywords

Task analysis; Processor scheduling; Schedules; Scheduling; Parallel processing; Optimization; Cluster computing; Data-parallel jobs; job scheduling; coflow scheduling; synchronization; job completion time

Funding

National Key Research & Development Program of China [2018YFB0204300]
National Natural Science Foundation of China [62025208, 61932001]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This research introduces a new scheduling unit abstraction called coBranch, which considers the dependencies between computation stages and coflows to jointly schedule coflows and jobs, aiming to reduce the average job completion time while increasing inter-job parallelism. By employing an urgency-based mechanism, the proposed method achieved significant reductions in average JCT and outperformed existing scheduling methods in prototype-based experiments and large-scale simulations.

For many data-parallel computing systems like Spark, a job usually consists of multiple computation stages and inter-stage communication (i.e., coflows). Many efforts have been done to schedule coflows and jobs independently. The simple combination of coflow scheduling and job scheduling, however, would prolong the average job completion time (JCT) due to the conflict. For this reason, we propose a new abstraction of scheduling unit, named coBranch, which takes the dependency between computation stages and coflows into consideration, to schedule coflows and jobs jointly. Besides, mainstream coflow schedulers are order-preserving, i.e., all coflows of a high-priority job are prioritized than those of a low-priority job. We observe that the order-preserving constraint incurs low inter-job parallelism. To overcome the problem, we employ an urgency-based mechanism to schedule coBranches, which aims to decrease the average JCT by enhancing the inter-job parallelism. We implement the urgency-based coBranch Scheduling (BS) method on Apache Spark, conduct prototype-based experiments, and evaluate the performance of our method against the shortest-job-first critical-path method and the FIFO method. Results show that our method achieves around 10 and 15 percent reduction in the average JCT, respectively. Large-scale simulations based on the Google trace show that our method performs better and reduces JCT by 23 and 35 percent, respectively.

Coordinative Scheduling of Computation and Communication in Data-Parallel Systems

Journal

IEEE TRANSACTIONS ON COMPUTERS

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Coordinative Scheduling of Computation and Communication in Data-Parallel Systems

Journal

IEEE TRANSACTIONS ON COMPUTERS

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper