4.7 Article

Network-Aware Locality Scheduling for Distributed Data Operators in Data Centers

Journal

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TPDS.2021.3053241

Keywords

Distributed databases; Bandwidth; Scheduling; Data centers; Optimization; Processor scheduling; Big Data; Data locality; coflow scheduling; distributed operators; data centers; big data; SDN; metaheuristic

Funding

  1. Beijing Municipal Science & Technology Commission [Z181100005118016]
  2. National Natural Science Foundation of China [61874124, 61876173]
  3. European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Grant [799066]
  4. Marie Curie Actions (MSCA) [799066] Funding Source: Marie Curie Actions (MSCA)

Ask authors/readers for more resources

Large data centers serve as the mainstream infrastructures for big data processing, with challenges in the efficient execution of distributed data operators. Current methods focus on either application-layer data locality optimization or network-layer data flow optimization independently. The NEAL approach bridges this gap and aims to reduce communication time for distributed big data operators.
Large data centers are currently the mainstream infrastructures for big data processing. As one of the most fundamental tasks in these environments, the efficient execution of distributed data operators (e.g., join and aggregation) are still challenging current data systems, and one of the key performance issues is network communication time. State-of-the-art methods trying to improve that problem focus on either application-layer data locality optimization to reduce network traffic or on network-layer data flow optimization to increase bandwidth utilization. However, the techniques in the two layers are totally independent from each other, and performance gains from a joint optimization perspective have not yet been explored. In this article, we propose a novel approach called NEAL (NEtwork-Aware Locality scheduling) to bridge this gap, and consequently to further reduce communication time for distributed big data operators. We present the detailed design and implementation of NEAL, and our experimental results demonstrate that NEAL always performs better than current approaches for different workloads and network bandwidth configurations.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available