4.7 Article

Intelligent ensembling of auto-ML system outputs for solving classification problems

期刊

INFORMATION SCIENCES
卷 609, 期 -, 页码 766-780

出版社

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2022.07.061

关键词

Ensemble methods; Auto -ML; Grammatical evolution; Supervised learning

资金

  1. University of Alicante
  2. Ministry of Science and Innovation of the Spanish Government
  3. Fondo Europeo de Desarrollo Regional (FEDER)
  4. Generalitat Valenciana (Conselleria d'Educacio, Investigacio, Cultura i Esport) [CIPROM/2021/21, PID2021-122263OB-C22, RTI2018-094653-B-C21/C22, PID2021-123956OB-I00]
  5. University of Havana

向作者/读者索取更多资源

This paper presents a two-phase optimization system that utilizes Auto-ML tools to solve classification problems and generate more robust classifiers. The experimental results show that ensembling a subset of already tested models can build a better solution, and ensuring diversity using the double-fault measure produces better results.
Automatic Machine Learning (Auto-ML) tools enable the automatic solution of real-world problems through machine learning techniques. These tools tend to be more time consum-ing than standard machine learning libraries, therefore, exploiting all the available resources to the full is a valuable feature. This paper presents a two-phase optimization system for solving classification problems. The system is designed to produce more robust classifiers by exploiting the different architectures that are generated while solving classi-fication problems with Auto-ML tools, particularly AutoGOAL. In the first phase, the system follows a probabilistic strategy to find the best combination of algorithms and hyperpa-rameters to generate a collection of base models according to certain diversity criteria; and in the second, it follows similar Auto-ML strategies to ensemble those models. The HAHA 2019 challenge corpus and the Adult dataset were used to evaluate the system. The experimental results show that: i) a better solution can be built by ensembling a subset of the already tested models; ii) the performance of ensemble methods depends on the col-lection of base models used; and, iii) ensuring diversity using the double-fault measure produces better results than the disagreement measure. The source code is available online for the research community. (c) 2022 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据