期刊
IEEE TRANSACTIONS ON CLOUD COMPUTING
卷 10, 期 3, 页码 2163-2177出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCC.2020.2994195
关键词
Geo-distributed data analytics; data placement; admission control; lyapunov optimization; two-timescale approach
类别
资金
- NSFC General Technology Basic Research Joint Funds [U1836214]
- State Key Program of National Natural Science of China [61832013]
- Artificial Intelligence Science and Technology Major Project of Tianjin [18ZXZNGX00190]
- National Key R&D Program of China [2019QY1302, 2019YFB2102404]
- NSFC [61672379, 61872265, 61672131]
- NSFC-Guangdong Joint Funds [U1701263]
- Natural Science Foundation of Tianjin [18ZXZNGX00040]
- National Key R&D Programof China [2018YFB1004700]
- Science Innovation Foundation of Dalian [2019J12GX037]
This article focuses on the cost-throughput tradeoff problem in geo-distributed data analytics, aiming to minimize inter-DC traffic cost and maximize system throughput. By formulating a stochastic optimization problem and designing an online control framework, the proposed method achieves near-optimal solutions and maintains system stability and robustness.
In the era of global-scale services, analytical queries are performed on datasets that span multiple data centers (DCs). Such geo-distributed queries generate a large amount of inter-DC data transfers at run time. Due to the expensive inter-DC bandwidth, various methods have been proposed to reduce the traffic cost in geo-distributed data analytics. However, current methods do not attempt to address the throughput issue in geo-distributed analytics. In this article, we target at characterizing and optimizing a cost-throughput tradeoff problem in geo-distributed data analytics. Our objectives are two-fold: (1) we minimize the inter-DC traffic cost when serving geo-distributed analytics with uncertain query demand, and (2) we maximize the system throughput, in terms of the number of query requests that can be successfully served with guaranteed queuing delay. Specifically, we formulate a stochastic optimization problem that seamlessly combines these two objectives. To solve this problem, we take advantage of Lyapunov optimization techniques to design and analyze a two-timescale online control framework. Without prior knowledge of future query requests, this framework makes online decisions on input data placement and admission control of query requests. Rigorous theoretical analyses show that our framework can achieve a near-optimal solution and maintain system stability and robustness as well. Extensive trace-driven simulation results further demonstrate that our framework is capable of reducing inter-DC traffic cost, improving system throughput, and guaranteeing a maximum delay for each query request.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据