4.3 Article

A comparative study between PCR, PLSR, and LW-PLS on the predictive performance at different data splitting ratios

期刊

CHEMICAL ENGINEERING COMMUNICATIONS
卷 209, 期 11, 页码 1439-1456

出版社

TAYLOR & FRANCIS INC
DOI: 10.1080/00986445.2021.1957853

关键词

Data splitting; locally weighted partial least square regression; partial least square regression; prediction; principal component regression; soft sensors

资金

  1. Curtin University Malaysia

向作者/读者索取更多资源

Principal component regression (PCR), partial least squares regression (PLSR), and locally weighted partial least squares (LW-PLS) models were studied for their predictive performance at different data splitting ratios, with LW-PLS performing better due to its capability to handle nonlinear data. Optimal splitting ratios were determined by evaluating root mean squared error, coefficient of determination, and error of approximation for five case studies. Split-sample ratios above 70% of training data showed significant improvements in predictive performance compared to base scenarios with higher E-a values.
Principal component regression (PCR), partial least squares regression (PLSR), and locally weighted partial least squares (LW-PLS) models are supervised learning methods in which a labeled dataset is used to train the model. The split-sample validation is normally used to train these models where a dataset is split into training and testing datasets to develop and evaluate the model. However, a limited study is done to evaluate the prediction performance of PCR, PLSR, and LW-PLS models at the different data splitting ratios. Hence, to address this research gap, this submitted work is conducted to investigate the predictive performance of the abovementioned regression models at the different split sample ratios for the data. Meanwhile, this study also serves to determine the optimal splitting ratios for PCR, PLSR, and LW-PLS models via a simple data splitting method where a minimum of 50% of the entire dataset is allocated to train the model. The optimal split is determined by evaluating the root mean squared error, coefficient of determination, and error of approximation (E-a) for five case studies. For PCR, PLSR, and LW-PLS models, LW-PLS performed better in most of the case studies since it copes better with the nonlinear data. Among these best models in each case study, it was found that the split-sample ratios of above 70% of training data had allowed major improvements in terms of predictive performance as compared to their base scenarios which have the largest E-a values.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据