4.7 Article

A new strategy to prevent over-fitting in partial least squares models based on model population analysis

期刊

ANALYTICA CHIMICA ACTA
卷 880, 期 -, 页码 32-41

出版社

ELSEVIER
DOI: 10.1016/j.aca.2015.04.045

关键词

Partial least squares; Over-fitting; Model population analysis; Model selection; Model stability; Cross-validation

资金

  1. National Nature Foundation Committee of P.R. China [21075138, 21105129, 21175157, 21275164, 81402853GN1, 21465016]

向作者/读者索取更多资源

Partial least squares (PLS) is one of the most widely used methods for chemical modeling. However, like many other parameter tunable methods, it has strong tendency of over-fitting. Thus, a crucial step in PLS model building is to select the optimal number of latent variables (nLVs). Cross-validation (CV) is the most popular method for PLS model selection because it selects a model from the perspective of prediction ability. However, a clear minimum of prediction errors may not be obtained in CV which makes the model selection difficult. To solve the problem, we proposed a new strategy for PLS model selection which combines the cross-validated coefficient of determination (Q(cv)(2)) and model stability (S). S is defined as the stability of PLS regression vectors which is obtained using model population analysis (MPA). The results show that, when a clear maximum of Q(cv)(2) is not obtained, S can provide additional information of over-fitting and it helps in finding the optimal nLVs. Compared with other regression vector based indictors such as the Euclidean 2-norm (B2), the Durbin Watson statistic (DW) and the jaggedness (J), S is more sensitive to over-fitting. The model selected by our method has both good prediction ability and stability. (C) 2015 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据