☆ 4.4 Article

Improving Pipelining Tools for Pre-processing Data

INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE (2022)

Journal

INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE

Volume 7, Issue 4, Pages 214-224

Publisher

UNIV INT RIOJA-UNIR

DOI: 10.9781/ijimai.2021.10.004

Keywords

Burst Processing; Data Pre-processing; Java; Pipeline Frameworks

Funding

Xunta de Galicia [ED481D-2021/024]
project Semantic Knowledge Integration for ContentBased Spam Filtering from the Spanish Ministry of Economy, Industry and Competitiveness (SMEIC) [TIN2017-84658-C2-1-R]
State Research Agency (SRA)
European Regional Development Fund (ERDF)
Conselleria de Educacion, Universidades e Formacion Profesional (Xunta de Galicia) [ED431C2018/55-GRC]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Data mining has become a powerful tool for exploring unseen connections between variables and facts in different domains. However, current data analysis frameworks, specifically those using pipelining schemes, lack early error detection techniques and developer support mechanisms. In this study, a new pipelining framework, BDP4J, is introduced with improved features to address these limitations.

The last several years have seen the emergence of data mining and its transformation into a powerful tool that adds value to business and research. Data mining makes it possible to explore and find unseen connections between variables and facts observed in different domains, helping us to better understand reality. The programming methods and frameworks used to analyse data have evolved over time. Currently, the use of pipelining schemes is the most reliable way of analysing data and due to this, several important companies are currently offering this kind of services. Moreover, several frameworks compatible with different programming languages are available for the development of computational pipelines and many research studies have addressed the optimization of data processing speed. However, as this study shows, the presence of early error detection techniques and developer support mechanisms is very limited in these frameworks. In this context, this study introduces different improvements, such as the design of different types of constraints for the early detection of errors, the creation of functions to facilitate debugging of concrete tasks included in a pipeline, the invalidation of erroneous instances and/or the introduction of the burst-processing scheme. Adding these functionalities, we developed Big Data Pipelining for Java (BDP4J, https://github.com/sing-group/bdp4j), a fully functional new pipelining framework that shows the potential of these features.

Improving Pipelining Tools for Pre-processing Data

Journal

INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE

Publisher

UNIV INT RIOJA-UNIR

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Improving Pipelining Tools for Pre-processing Data

Journal

INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE

Publisher

UNIV INT RIOJA-UNIR

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper