4.5 Article

Determination of optimum number of components in partial least squares regression from distributions of the root-mean-squared error obtained by Monte Carlo resampling

期刊

JOURNAL OF CHEMOMETRICS
卷 32, 期 4, 页码 -

出版社

WILEY
DOI: 10.1002/cem.2993

关键词

model dimension; model selection; model validation; Monte Carlo resampling; PLS regression

资金

  1. Forde Health Trust
  2. Norges Forskningsrad [221047/F40]
  3. Norwegian Research Council

向作者/读者索取更多资源

Monte Carlo resampling is utilized to determine the number of components in partial least squares (PLS) regression. The data are randomly and repeatedly divided into calibration and validation samples. For each repetition, the root-mean-squared error (RMSE) is determined for the validation samples for a=1, 2, ... , A PLS components to provide a distribution of RMSE values for each number of PLS components. These distributions are used to determine the median RMSE for each number of PLS components. The component (A(min)) having the lowest median RMSE is located. The fraction p of the RMSE values of A(min) exceeding the median RMSE for the preceding component is determined. This fraction p represents a probability measure that can be used to decide if the RMSE for the A(min) PLS component is significantly lower than the RMSE for the preceding component for a preselected threshold (p(upper)). If so, it defines the optimum number of PLS components. If not, the process is repeated for the previous components until significance is achieved. The p(upper)=0.5 implies that the median is used for selecting the optimum number of components. The RMSE is approximately normally distributed on the smallest components. This can be utilized to relate p to a fraction of a standard deviation. For instance, p=0.308 corresponds to half a standard deviation if RMSE is normally distributed. The approach is demonstrated for calibration of metabolomics measurements and spectroscopic mixture data.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据