3.8 Proceedings Paper

A Middleware for Managing Big-Data Flows

期刊

出版社

SPRINGER-VERLAG BERLIN

关键词

-

向作者/读者索取更多资源

Hadoop is being used for various diverse kinds of applications over diverse kinds of data. This makes developing and managing data flows over Hadoop MapReduce a complex task. Various scripting languages such as Hive, Pig, Jaql, etc., have been developed to hide the complexity of MapReduce applications from the user. But, even these high level query languages can get complex over-time and it is a non-trivial task even for a user proficient in these languages to develop, debug, and maintain these scripts. This paper presents a middleware for developing and maintaining MapReduce data flows. This middleware can be used to Extract data from diverse data sources, Load it into distributed file system, and Transform in a format which can be easily analyzed by the subsequent systems in a user friendly manner. MetaOperators are the backbone of our middleware. Using MetaOperators one can express a data-flow only by specifying the relevant inputs rather than worrying about data schema and the query syntax. A data-flow written using such MetaOperators localizes schema specific parts of the query to the MetaOperator parameters making the flow easier to develop, debug, and maintain. Using these MetaOperators we show how one can express operations over hierarchical as well as flat data in a similar manner, track data schema as it flows through the operators, and add a drag-and-drop GUI layer on top of this framework. This brings MapReduce application development in the realm of middle management.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据