4.7 Article

On taking advantage of opportunistic meta-knowledge to reduce configuration spaces for automated machine learning

Journal

EXPERT SYSTEMS WITH APPLICATIONS
Volume 239, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2023.122359

Keywords

AutoML; Automated machine learning; Opportunistic meta-knowledge; Search space reduction; Configuration space reduction; ML pipeline composition and optimisation

Ask authors/readers for more resources

The paper addresses the problem of efficiently optimizing machine learning solutions by reducing the configuration space of ML pipelines and leveraging historical performance. The experiments conducted show that opportunistic/systematic meta-knowledge can improve ML outcomes, and configuration-space culling is optimal when balanced. The utility and impact of meta-knowledge depend on various factors and are crucial for generating informative meta-knowledge bases.
The optimisation of a machine learning (ML) solution is a core research problem in the field of automated machine learning (AutoML). This process can require searching through complex configuration spaces of not only ML components and their hyperparameters but also ways of composing them together, i.e. forming ML pipelines. Optimisation efficiency and the model accuracy attainable for a fixed time budget suffer if this pipeline configuration space is excessively large. A key research question is whether it is both possible and practical to preemptively avoid costly evaluations of poorly performing ML pipelines by leveraging their historical performance for various ML tasks, i.e. meta-knowledge. This paper approaches the research question by first formulating the problem of configuration space reduction in the context of AutoML. Given a pool of available ML components, it then investigates whether previous experience can recommend the most promising subset to use as a configuration space when initiating a pipeline composition/optimisation process for a new ML problem, i.e. running AutoML on a new dataset. Specifically, we conduct experiments to explore (1) what size the reduced search space should be and (2) which strategy to use when recommending the most promising subset. The previous experience comes in the form of classifier/regressor accuracy rankings derived from either (1) a substantial but non-exhaustive number of pipeline evaluations made during historical AutoML runs, i.e. 'opportunistic' meta-knowledge, or (2) comprehensive cross-validated evaluations of classifiers/regressors with default hyperparameters, i.e. 'systematic' meta-knowledge. Overall, numerous experiments with the AutoWeka4MCPS package, including ones leveraging similarities between datasets via the relative landmarking method, suggest that (1) opportunistic/systematic meta-knowledge can improve ML outcomes, typically in line with how relevant that meta-knowledge is, and (2) configuration-space culling is optimal when it is neither too conservative nor too radical. However, the utility and impact of meta-knowledge depend critically on numerous facets of its generation and exploitation, warranting extensive analysis; these are often overlooked/underappreciated within AutoML and meta-learning literature. In particular, we observe strong sensitivity to the 'challenge' of a dataset, i.e. whether specificity in choosing a predictor leads to significantly better performance. Ultimately, identifying 'difficult' datasets, thus defined, is crucial to both generating informative meta-knowledge bases and understanding optimal search-space reduction strategies.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available