☆ 4.7 Article

A Machine Learning Approach for an HPC Use Case: the Jobs Queuing Time Prediction

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE (2023)

Journal

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE

Volume 143, Issue -, Pages 215-230

Publisher

ELSEVIER

DOI: 10.1016/j.future.2023.01.020

Keywords

High performance computing; Queues; Batch scheduler; Automatism; Machine learning; Uncertainty quantification

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The HPC domain has provided the necessary tools for scientific and industrial advancements. Supercomputers have been crucial for developing and testing advanced technologies. This study focuses on applying machine learning techniques to predict the waiting time of jobs in the queue of real supercomputers.

High-Performance Computing (HPC) domain provided the necessary tools to support the scientific and industrial advancements we all have seen during the last decades. HPC is a broad domain targeting to provide both software and hardware solutions as well as envisioning methodologies that allow achieving goals of interest, such as system performance and energy efficiency. In this context, supercomputers have been the vehicle for developing and testing the most advanced technologies since their first appearance. Unlike cloud computing resources that are provided to the end-users in an on-demand fashion in the form of virtualized resources (i.e., virtual machines and containers), supercomputers' resources are generally served through State-of-the-Art batch schedulers (e.g., SLURM, PBS, LSF, HTCondor). As such, the users submit their computational jobs to the system, which manages their execution with the support of queues. In this regard, predicting the behaviour of the jobs in the batch scheduler queues becomes worth it. Indeed, there are many cases where a deeper knowledge of the time experienced by a job in a queue (e.g., the submission of check-pointed jobs or the submission of jobs with execution dependencies) allows exploring more effective workflow orchestration policies. In this work, we focused on applying machine learning (ML) techniques to learn from the historical data collected from the queuing system of real supercomputers, aiming at predicting the time spent on a queue by a given job. Specifically, we applied both unsupervised learning (UL) and supervised learning (SL) techniques to define the most effective features for the prediction task and the actual prediction of the queue waiting time. For this purpose, two approaches have been explored: on one side, the prediction of ranges on jobs' queuing times (classification approach) and, on the other side, the prediction of the waiting time at the minutes level (regression approach). Experimental results highlight the strong relationship between the SL models' performances and the way the dataset is split. At the end of the prediction step, we present the uncertainty quantification approach, i.e., a tool to associate the predictions with reliability metrics, based on variance estimation. (c) 2023 Published by Elsevier B.V.

A Machine Learning Approach for an HPC Use Case: the Jobs Queuing Time Prediction

Journal

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A Machine Learning Approach for an HPC Use Case: the Jobs Queuing Time Prediction

Journal

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper