4.4 Article

Big data and machine learning framework for clouds and its usage for text classification

Journal

Publisher

WILEY
DOI: 10.1002/cpe.6164

Keywords

big data; cloud; machine learning; parallel and distributed execution; reference architectures; text classification

Funding

  1. Bolyai+ Scholarship for Young Higher Education Teachers and Researchers [UNKP-20-5-OE-73]
  2. European H2020 NEANIAS [863448]
  3. Hungarian Scientific Research Fund (OTKA) [K 132838]

Ask authors/readers for more resources

The paper discusses reference architectures for big data and machine learning, focusing on the application of Apache Spark cluster, Jupyter framework, and Occopus cloud-agnostic orchestrator tool. The approach has been demonstrated and validated through a text classification application on the Hungarian academic research infrastructure.
Reference architectures for big data and machine learning include not only interconnected building blocks but important considerations (among others) for scalability, manageability and usability issues as well. Leveraging on such reference architectures, the automated deployment of distributed toolsets and frameworks on various clouds is still challenging due to the diversity of technologies and protocols. The paper focuses particularly on the widespread Apache Spark cluster with Jupyter as the particularly addressed framework, and the Occopus cloud-agnostic orchestrator tool for automating its deployment and maintenance stages. The presented approach has been demonstrated and validated with a new, promising text classification application on the Hungarian academic research infrastructure, the OpenStack-based MTA Cloud. The paper explains the concept, the applied components, and illustrates their usage with real use-case measurements.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available