☆ 4.7 Article

Big Data: Tutorial and guidelines on information and process fusion for analytics algorithms with MapReduce

INFORMATION FUSION (2018)

期刊

INFORMATION FUSION

卷 42, 期 -, 页码 51-61

出版社

ELSEVIER

DOI: 10.1016/j.inffus.2017.10.001

关键词

Big Data Analytics; MapReduce; Information fusion; Spark; Machine learning

类别

Computer Science, Artificial Intelligence Computer Science, Theory & Methods

资金

FEDER funds
Spanish Ministry of Science and Technology [TIN2014-57251-P, TIN2015-68454-R]
Foundation BBVA project BigDaPTOOLS [75/2016]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

We live in a world were data are generated from a myriad of sources, and it is really cheap to collect and storage such data. However, the real benefit is not related to the data itself, but with the algorithms that are capable of processing such data in a tolerable elapse time, and to extract valuable knowledge from it. Therefore, the use of Big Data Analytics tools provide very significant advantages to both industry and academia. The MapReduce programming framework can be stressed as the main paradigm related with such tools. It is mainly identified by carrying out a distributed execution for the sake of providing a high degree of scalability, together with a fault tolerant scheme. In every MapReduce algorithm, first local models are learned with a subset of the original data within the so-called Map tasks. Then, the Reduce task is devoted to fuse the partial outputs generated by each Map. The ways of designing such fusion of information/models may have a strong impact in the quality of the final system. In this work, we will enumerate and analyze two alternative methodologies that may be found both in the specialized literature and in standard Machine Learning libraries for Big Data. Our main objective is to provide an introduction of the characteristics of these methodologies, as well as giving some guidelines for the design of novel algorithms in this field of research. Finally, a short experimental study will allow us to contrast the scalability issues for each type of process fusion in MapReduce for Big Data Analytics.

Big Data: Tutorial and guidelines on information and process fusion for analytics algorithms with MapReduce

期刊

INFORMATION FUSION

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Big Data: Tutorial and guidelines on information and process fusion for analytics algorithms with MapReduce

期刊

INFORMATION FUSION

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文