4.7 Article

Cloud-agnostic architectures for machine learning based on Apache Spark

Journal

ADVANCES IN ENGINEERING SOFTWARE
Volume 159, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.advengsoft.2021.103029

Keywords

Reference architectures; Big data; Artificial intelligence; Machine learning; Cloud computing; Orchestration; Distributed computing; Stream processing; Spark

Funding

  1. Ministry of Innovation and Technology NRDI Office [863448 (H2020-INFRAEOSC-2019-1)]
  2. Hungarian Scientific Research Fund [OTKA K 132838]
  3. Janos Bolyai Research Scholarship of the Hungarian Academy of Sciences

Ask authors/readers for more resources

This paper discusses the importance and application of reference architectures for Big Data, machine learning, and stream processing, focusing on the Apache Spark platform and the cloud-agnostic orchestration tool Occopus. The new generation reference architectures can be configured flexibly according to available resources and cloud providers, supporting multi-cloud deployment, and have been successfully applied in projects at the Hungarian Institute for Political Science.
Reference architectures for Big Data, machine learning and stream processing include not only recommended practices and interconnected building blocks but considerations for scalability, availability, manageability, and security as well. However, the automated deployment of multi-VM platforms on various clouds leveraging on such reference architectures may raise several issues. The paper focuses particularly on the widespread Apache Spark Big Data platform as the baseline and the Occopus cloud-agnostic orchestrator tool. The set of new generation reference architectures are configurable by human-readable descriptors according to available resources and cloud-providers, and offers various components such as Jupyter Notebook, RStudio, HDFS, and Kafka. These pre-configured reference architectures can be automatically deployed even by the data scientist on-demand, using a multi-cloud approach for a wide range of cloud systems like Amazon AWS, Microsoft Azure, OpenStack, OpenNebula, CloudSigma, etc. Occopus enables the scaling of cluster-oriented components (such as Spark) of the instantiated reference architectures. The presented solution was successfully used in the Hungarian Comparative Agendas Project (CAP) by the Institute for Political Science to classify newspaper articles.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available