☆ 4.3 Article

A comparative study between PCR, PLSR, and LW-PLS on the predictive performance at different data splitting ratios

CHEMICAL ENGINEERING COMMUNICATIONS (2022)

期刊

CHEMICAL ENGINEERING COMMUNICATIONS

卷 209, 期 11, 页码 1439-1456

出版社

TAYLOR & FRANCIS INC

DOI: 10.1080/00986445.2021.1957853

关键词

Data splitting; locally weighted partial least square regression; partial least square regression; prediction; principal component regression; soft sensors

类别

Engineering, Chemical

资金

Curtin University Malaysia

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Principal component regression (PCR), partial least squares regression (PLSR), and locally weighted partial least squares (LW-PLS) models were studied for their predictive performance at different data splitting ratios, with LW-PLS performing better due to its capability to handle nonlinear data. Optimal splitting ratios were determined by evaluating root mean squared error, coefficient of determination, and error of approximation for five case studies. Split-sample ratios above 70% of training data showed significant improvements in predictive performance compared to base scenarios with higher E-a values.

Principal component regression (PCR), partial least squares regression (PLSR), and locally weighted partial least squares (LW-PLS) models are supervised learning methods in which a labeled dataset is used to train the model. The split-sample validation is normally used to train these models where a dataset is split into training and testing datasets to develop and evaluate the model. However, a limited study is done to evaluate the prediction performance of PCR, PLSR, and LW-PLS models at the different data splitting ratios. Hence, to address this research gap, this submitted work is conducted to investigate the predictive performance of the abovementioned regression models at the different split sample ratios for the data. Meanwhile, this study also serves to determine the optimal splitting ratios for PCR, PLSR, and LW-PLS models via a simple data splitting method where a minimum of 50% of the entire dataset is allocated to train the model. The optimal split is determined by evaluating the root mean squared error, coefficient of determination, and error of approximation (E-a) for five case studies. For PCR, PLSR, and LW-PLS models, LW-PLS performed better in most of the case studies since it copes better with the nonlinear data. Among these best models in each case study, it was found that the split-sample ratios of above 70% of training data had allowed major improvements in terms of predictive performance as compared to their base scenarios which have the largest E-a values.

A comparative study between PCR, PLSR, and LW-PLS on the predictive performance at different data splitting ratios

期刊

CHEMICAL ENGINEERING COMMUNICATIONS

出版社

TAYLOR & FRANCIS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A comparative study between PCR, PLSR, and LW-PLS on the predictive performance at different data splitting ratios

期刊

CHEMICAL ENGINEERING COMMUNICATIONS

出版社

TAYLOR & FRANCIS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文