☆ 4.6 Article

Big Data-Oriented PaaS Architecture with Disk-as-a-Resource Capability and Container-Based Virtualization

JOURNAL OF GRID COMPUTING (2018)

Journal

JOURNAL OF GRID COMPUTING

Volume 16, Issue 4, Pages 587-605

Publisher

SPRINGER

DOI: 10.1007/s10723-018-9460-4

Keywords

Big data; Platform as a Service (PaaS); Cloud computing; Disk-as-a-resource scheduling; Operating-system-level virtualization

Funding

Ministry of Economy, Industry and Competitiveness of Spain [TIN2016-75845-P]
FPU Program of the Ministry of Education [FPU15/03381]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

With the increasing adoption of Big Data technologies as basic tools for the ongoing Digital Transformation, there is a high demand for data-intensive applications. In order to efficiently execute such applications, it is vital that cloud providers change the way hardware infrastructure resources are managed to improve their performance. However, the increasing use of virtualization technologies to achieve an efficient usage of infrastructure resources continuously widens the gap between applications and the underlying hardware, thus decreasing resource efficiency for the end user. Moreover, this scenario is especially troublesome for Big Data applications, as storage resources are one of the most heavily virtualized, thus imposing a significant overhead for large-scale data processing. This paper proposes a novel PaaS architecture specifically oriented for Big Data where the scheduler offers disks as resources alongside the more common CPU and memory resources, looking forward to provide a better storage solution for the user. Furthermore, virtualization overheads are reduced to the bare minimum by replacing heavy hypervisor-based technologies with operating-system-level virtualization based on light software containers. This architecture has been deployed on a Big Data infrastructure at the CESGA supercomputing center, used as a testbed to compare its performance with OpenStack, a popular private cloud platform. Results have shown significant performance improvements, reducing the execution time of representative Big Data workloads by up to 4.5x.

Big Data-Oriented PaaS Architecture with Disk-as-a-Resource Capability and Container-Based Virtualization

Journal

JOURNAL OF GRID COMPUTING

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Big Data-Oriented PaaS Architecture with Disk-as-a-Resource Capability and Container-Based Virtualization

Journal

JOURNAL OF GRID COMPUTING

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper