Journal
PROCEEDINGS OF THE VLDB ENDOWMENT
Volume 14, Issue 9, Pages 1570-1582Publisher
ASSOC COMPUTING MACHINERY
DOI: 10.14778/3461535.3461545
Keywords
-
Ask authors/readers for more resources
The recent advancements in storage technologies have popularized the use of tiered storage systems in data-intensive compute clusters. Trident, a task scheduling approach that makes optimal task assignment decisions based on both locality and storage tier information, has been implemented in both Spark and Hadoop, demonstrating significant benefits in terms of application performance and cluster efficiency.
The recent advancements in storage technologies have popularized the use of tiered storage systems in data-intensive compute clusters. The Hadoop Distributed File System (HDFS), for example, now supports storing data in memory, SSDs, and HDDs, while OctopusFS and hatS offer fine-grained storage tiering solutions. However, the task schedulers of big data platforms (such as Hadoop and Spark) will assign tasks to available resources only based on data locality information, and completely ignore the fact that local data is now stored on a variety of storage media with different performance characteristics. This paper presents Trident, a principled task scheduling approach that is designed to make optimal task assignment decisions based on both locality and storage tier information. Trident formulates task scheduling as a minimum cost maximum matching problem in a bipartite graph and uses a standard solver for finding the optimal solution. In addition, Trident utilizes two novel pruning algorithms for bounding the size of the graph, while still guaranteeing optimality. Trident is implemented in both Spark and Hadoop, and evaluated extensively using a realistic workload derived from Facebook traces as well as an industry-validated benchmark, demonstrating significant benefits in terms of application performance and cluster efficiency.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available