☆ 4.3 Article

A comparative study between PCR, PLSR, and LW-PLS on the predictive performance at different data splitting ratios

CHEMICAL ENGINEERING COMMUNICATIONS (2022)

Journal

CHEMICAL ENGINEERING COMMUNICATIONS

Volume 209, Issue 11, Pages 1439-1456

Publisher

TAYLOR & FRANCIS INC

DOI: 10.1080/00986445.2021.1957853

Keywords

Data splitting; locally weighted partial least square regression; partial least square regression; prediction; principal component regression; soft sensors

Funding

Curtin University Malaysia

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Principal component regression (PCR), partial least squares regression (PLSR), and locally weighted partial least squares (LW-PLS) models were studied for their predictive performance at different data splitting ratios, with LW-PLS performing better due to its capability to handle nonlinear data. Optimal splitting ratios were determined by evaluating root mean squared error, coefficient of determination, and error of approximation for five case studies. Split-sample ratios above 70% of training data showed significant improvements in predictive performance compared to base scenarios with higher E-a values.

Principal component regression (PCR), partial least squares regression (PLSR), and locally weighted partial least squares (LW-PLS) models are supervised learning methods in which a labeled dataset is used to train the model. The split-sample validation is normally used to train these models where a dataset is split into training and testing datasets to develop and evaluate the model. However, a limited study is done to evaluate the prediction performance of PCR, PLSR, and LW-PLS models at the different data splitting ratios. Hence, to address this research gap, this submitted work is conducted to investigate the predictive performance of the abovementioned regression models at the different split sample ratios for the data. Meanwhile, this study also serves to determine the optimal splitting ratios for PCR, PLSR, and LW-PLS models via a simple data splitting method where a minimum of 50% of the entire dataset is allocated to train the model. The optimal split is determined by evaluating the root mean squared error, coefficient of determination, and error of approximation (E-a) for five case studies. For PCR, PLSR, and LW-PLS models, LW-PLS performed better in most of the case studies since it copes better with the nonlinear data. Among these best models in each case study, it was found that the split-sample ratios of above 70% of training data had allowed major improvements in terms of predictive performance as compared to their base scenarios which have the largest E-a values.

A comparative study between PCR, PLSR, and LW-PLS on the predictive performance at different data splitting ratios

Journal

CHEMICAL ENGINEERING COMMUNICATIONS

Publisher

TAYLOR & FRANCIS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A comparative study between PCR, PLSR, and LW-PLS on the predictive performance at different data splitting ratios

Journal

CHEMICAL ENGINEERING COMMUNICATIONS

Publisher

TAYLOR & FRANCIS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper