☆ 4.0 Article

Comparative Evaluation and Comprehensive Analysis of Machine Learning Models for Regression Problems

DATA INTELLIGENCE (2022)

Journal

DATA INTELLIGENCE

Volume 4, Issue 3, Pages 620-652

Publisher

MIT PRESS

DOI: 10.1162/dint_a_00155

Keywords

Machine learning; Regression; Comparative evaluation; Analysis; Validation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This study aims to analyze the performance of machine learning models on different datasets, considering various training strategies and evaluation metrics. The results demonstrate that the deep Long-Short Term Memory (LSTM) neural network outperforms other models and indicate the significant impact of cross-validation on the experimental results.

Artificial intelligence and machine learning applications are of significant importance almost in every field of human life to solve problems or support human experts. However, the determination of the machine learning model to achieve a superior result for a particular problem within the wide real-life application areas is still a challenging task for researchers. The success of a model could be affected by several factors such as dataset characteristics, training strategy and model responses. Therefore, a comprehensive analysis is required to determine model ability and the efficiency of the considered strategies. This study implemented ten benchmark machine learning models on seventeen varied datasets. Experiments are performed using four different training strategies 60:40, 70:30, and 80:20 hold-out and five-fold cross-validation techniques. We used three evaluation metrics to evaluate the experimental results: mean squared error, mean absolute error, and coefficient of determination (R-2 score). The considered models are analyzed, and each model's advantages, disadvantages, and data dependencies are indicated. As a result of performed excess number of experiments, the deep Long-Short Term Memory (LSTM) neural network outperformed other considered models, namely, decision tree, linear regression, support vector regression with a linear and radial basis function kernels, random forest, gradient boosting, extreme gradient boosting, shallow neural network, and deep neural network. It has also been shown that cross-validation has a tremendous impact on the results of the experiments and should be considered for the model evaluation in regression studies where data mining or selection is not performed.

Comparative Evaluation and Comprehensive Analysis of Machine Learning Models for Regression Problems

Journal

DATA INTELLIGENCE

Publisher

MIT PRESS

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Comparative Evaluation and Comprehensive Analysis of Machine Learning Models for Regression Problems

Journal

DATA INTELLIGENCE

Publisher

MIT PRESS

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper