4.7 Article

Intelligent ensembling of auto-ML system outputs for solving classification problems

Journal

INFORMATION SCIENCES
Volume 609, Issue -, Pages 766-780

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2022.07.061

Keywords

Ensemble methods; Auto -ML; Grammatical evolution; Supervised learning

Funding

  1. University of Alicante
  2. Ministry of Science and Innovation of the Spanish Government
  3. Fondo Europeo de Desarrollo Regional (FEDER)
  4. Generalitat Valenciana (Conselleria d'Educacio, Investigacio, Cultura i Esport) [CIPROM/2021/21, PID2021-122263OB-C22, RTI2018-094653-B-C21/C22, PID2021-123956OB-I00]
  5. University of Havana

Ask authors/readers for more resources

This paper presents a two-phase optimization system that utilizes Auto-ML tools to solve classification problems and generate more robust classifiers. The experimental results show that ensembling a subset of already tested models can build a better solution, and ensuring diversity using the double-fault measure produces better results.
Automatic Machine Learning (Auto-ML) tools enable the automatic solution of real-world problems through machine learning techniques. These tools tend to be more time consum-ing than standard machine learning libraries, therefore, exploiting all the available resources to the full is a valuable feature. This paper presents a two-phase optimization system for solving classification problems. The system is designed to produce more robust classifiers by exploiting the different architectures that are generated while solving classi-fication problems with Auto-ML tools, particularly AutoGOAL. In the first phase, the system follows a probabilistic strategy to find the best combination of algorithms and hyperpa-rameters to generate a collection of base models according to certain diversity criteria; and in the second, it follows similar Auto-ML strategies to ensemble those models. The HAHA 2019 challenge corpus and the Adult dataset were used to evaluate the system. The experimental results show that: i) a better solution can be built by ensembling a subset of the already tested models; ii) the performance of ensemble methods depends on the col-lection of base models used; and, iii) ensuring diversity using the double-fault measure produces better results than the disagreement measure. The source code is available online for the research community. (c) 2022 Elsevier Inc. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available