4.1 Article

Low Latency Geo-distributed Data Analytics

期刊

ACM SIGCOMM COMPUTER COMMUNICATION REVIEW
卷 45, 期 4, 页码 421-434

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/2829988.2787505

关键词

geo-distributed; low latency; data analytics; network aware; WAN analytics

资金

  1. NSF [CNS-1302041, CNS-1330308, CNS-1345249]

向作者/读者索取更多资源

Low latency analytics on geographically distributed datasets (across datacenters, edge clusters) is an upcoming and increasingly important challenge. The dominant approach of aggregating all the data to a single data center significantly inflates the timeliness of analytics. At the same time, running queries over geo-distributed inputs using the current intra-DC analytics frameworks also leads to high query response times because these frameworks cannot cope with the relatively low and variable capacity of WAN links. We present Iridium, a system for low latency geo-distributed analytics. Iridium achieves low query response times by optimizing placement of both data and tasks of the queries. The joint data and task placement optimization, however, is intractable. Therefore, Iridium uses an online heuristic to redistribute datasets among the sites prior to queries' arrivals, and places the tasks to reduce network bottlenecks during the query's execution. Finally, it also contains a knob to budget WAN usage. Evaluation across eight worldwide EC2 regions using production queries show that Iridium speeds up queries by 3 x 19 x and lowers WAN usage by 15% 64% compared to existing baselines.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.1
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据