4.7 Article

Network-Aware Locality Scheduling for Distributed Data Operators in Data Centers

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TPDS.2021.3053241

关键词

Distributed databases; Bandwidth; Scheduling; Data centers; Optimization; Processor scheduling; Big Data; Data locality; coflow scheduling; distributed operators; data centers; big data; SDN; metaheuristic

资金

  1. Beijing Municipal Science & Technology Commission [Z181100005118016]
  2. National Natural Science Foundation of China [61874124, 61876173]
  3. European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Grant [799066]
  4. Marie Curie Actions (MSCA) [799066] Funding Source: Marie Curie Actions (MSCA)

向作者/读者索取更多资源

Large data centers serve as the mainstream infrastructures for big data processing, with challenges in the efficient execution of distributed data operators. Current methods focus on either application-layer data locality optimization or network-layer data flow optimization independently. The NEAL approach bridges this gap and aims to reduce communication time for distributed big data operators.
Large data centers are currently the mainstream infrastructures for big data processing. As one of the most fundamental tasks in these environments, the efficient execution of distributed data operators (e.g., join and aggregation) are still challenging current data systems, and one of the key performance issues is network communication time. State-of-the-art methods trying to improve that problem focus on either application-layer data locality optimization to reduce network traffic or on network-layer data flow optimization to increase bandwidth utilization. However, the techniques in the two layers are totally independent from each other, and performance gains from a joint optimization perspective have not yet been explored. In this article, we propose a novel approach called NEAL (NEtwork-Aware Locality scheduling) to bridge this gap, and consequently to further reduce communication time for distributed big data operators. We present the detailed design and implementation of NEAL, and our experimental results demonstrate that NEAL always performs better than current approaches for different workloads and network bandwidth configurations.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据