4.4 Article

Trident: Task Scheduling over Tiered Storage Systems in Big Data Platforms

Journal

PROCEEDINGS OF THE VLDB ENDOWMENT
Volume 14, Issue 9, Pages 1570-1582

Publisher

ASSOC COMPUTING MACHINERY
DOI: 10.14778/3461535.3461545

Keywords

-

Ask authors/readers for more resources

The recent advancements in storage technologies have popularized the use of tiered storage systems in data-intensive compute clusters. Trident, a task scheduling approach that makes optimal task assignment decisions based on both locality and storage tier information, has been implemented in both Spark and Hadoop, demonstrating significant benefits in terms of application performance and cluster efficiency.
The recent advancements in storage technologies have popularized the use of tiered storage systems in data-intensive compute clusters. The Hadoop Distributed File System (HDFS), for example, now supports storing data in memory, SSDs, and HDDs, while OctopusFS and hatS offer fine-grained storage tiering solutions. However, the task schedulers of big data platforms (such as Hadoop and Spark) will assign tasks to available resources only based on data locality information, and completely ignore the fact that local data is now stored on a variety of storage media with different performance characteristics. This paper presents Trident, a principled task scheduling approach that is designed to make optimal task assignment decisions based on both locality and storage tier information. Trident formulates task scheduling as a minimum cost maximum matching problem in a bipartite graph and uses a standard solver for finding the optimal solution. In addition, Trident utilizes two novel pruning algorithms for bounding the size of the graph, while still guaranteeing optimality. Trident is implemented in both Spark and Hadoop, and evaluated extensively using a realistic workload derived from Facebook traces as well as an industry-validated benchmark, demonstrating significant benefits in terms of application performance and cluster efficiency.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available