☆ 4.7 Article

HDM: A Composable Framework for Big Data Processing

IEEE TRANSACTIONS ON BIG DATA (2018)

Journal

IEEE TRANSACTIONS ON BIG DATA

Volume 4, Issue 2, Pages 150-163

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TBDATA.2017.2690906

Keywords

Big data processing; parallel programming; functional programming; distributed systems; system architecture

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Over the past years, frameworks such as MapReduce and Spark have been introduced to ease the task of developing big data programs and applications. However, the jobs in these frameworks are roughly defined and packaged as executable jars without any functionality being exposed or described. This means that deployed jobs are not natively composable and reusable for subsequent development. Besides, it also hampers the ability for applying optimizations on the data flow of job sequences and pipelines. In this paper, we present the Hierarchically Distributed Data Matrix (HDM) which is a functional, strongly-typed data representation for writing composable big data applications. Along with HDM, a runtime framework is provided to support the execution, integration and management of HDM applications on distributed infrastructures. Based on the functional data dependency graph of HDM, multiple optimizations are applied to improve the performance of executing HDM jobs. The experimental results show that our optimizations can achieve improvements between 10 to 40 percent of the Job-Completion-Time for different types of applications when compared with the current state of art, Apache Spark.

HDM: A Composable Framework for Big Data Processing

Journal

IEEE TRANSACTIONS ON BIG DATA

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

HDM: A Composable Framework for Big Data Processing

Journal

IEEE TRANSACTIONS ON BIG DATA

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper