4.5 Article

The prediction error in CLS and PLS: the importance of feature selection prior to multivariate calibration

期刊

JOURNAL OF CHEMOMETRICS
卷 19, 期 2, 页码 107-118

出版社

WILEY
DOI: 10.1002/cem.915

关键词

classical least squares; partial least squares; prediction error; dimensional reduction; feature selection

向作者/读者索取更多资源

Classical least squares (CLS) and partial least squares (PLS) are two common multivariate regression algorithms in chemometrics. This paper presents an asymptotically exact mathematical analysis of the mean squared error of prediction of CLS and PLS under the linear mixture model commonly assumed in spectroscopy. For CLS regression with a very large calibration set the root mean squared error is approximately equal to the noise per wavelength divided by the length of the net analyte signal vector. It is shown, however, that for a finite training set with n samples in p dimensions there are additional error terms that depend on a sigma(2)p(2)/n(2), where sigma is the noise level per co-ordinate. Therefore in the 'large p-small n' regime, common in spectroscopy, these terms can be quite large and even dominate the overall prediction error. It is demonstrated both theoretically and by simulations that dimensional reduction of the input data via their compact representation with a few features, selected for example by adaptive wavelet compression, can substantially decrease these effects and recover the asymptotic error. This analysis provides a theoretical justification for the need to perform feature selection (dimensional reduction) of the input data prior to application of multivariate regression algorithms. Copyright (C) 2005 John Wiley & Sons, Ltd.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据