期刊
INFORMATION
卷 14, 期 2, 页码 -出版社
MDPI
DOI: 10.3390/info14020093
关键词
Big Data; data management; data collection; data analysis; data processing; Hadoop; MapReduce; Spark; Mahout; MLlib
This paper proposes EverAnalyzer, a self-adjustable Big Data management platform that utilizes multiple frameworks to address different data processing and analysis scenarios. By collecting data and utilizing metadata, the platform is able to recommend the best framework for users. Experimental results demonstrate that EverAnalyzer correctly suggests the optimum framework in the majority of cases.
Big Data is a phenomenon that affects today's world, with new data being generated every second. Today's enterprises face major challenges from the increasingly diverse data, as well as from indexing, searching, and analyzing such enormous amounts of data. In this context, several frameworks and libraries for processing and analyzing Big Data exist. Among those frameworks Hadoop MapReduce, Mahout, Spark, and MLlib appear to be the most popular, although it is unclear which of them best suits and performs in various data processing and analysis scenarios. This paper proposes EverAnalyzer, a self-adjustable Big Data management platform built to fill this gap by exploiting all of these frameworks. The platform is able to collect data both in a streaming and in a batch manner, utilizing the metadata obtained from its users' processing and analytical processes applied to the collected data. Based on this metadata, the platform recommends the optimum framework for the data processing/analytical activities that the users aim to execute. To verify the platform's efficiency, numerous experiments were carried out using 30 diverse datasets related to various diseases. The results revealed that EverAnalyzer correctly suggested the optimum framework in 80% of the cases, indicating that the platform made the best selections in the majority of the experiments.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据