☆ 4.7 Article

Boosting model performance and interpretation by entangling preprocessing selection and variable selection

ANALYTICA CHIMICA ACTA (2016)

期刊

ANALYTICA CHIMICA ACTA

卷 938, 期 -, 页码 44-52

出版社

ELSEVIER SCIENCE BV

DOI: 10.1016/j.aca.2016.08.022

关键词

Design of experiments; Variable selection; Preprocessing selection; Partial least squares; Chemometrics

类别

Chemistry, Analytical

资金

Netherlands Organization for Scientific Research (NWO) of Technology Area COAST

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

The aim of data preprocessing is to remove data artifacts such as a baseline, scatter effects or noise-and to enhance the contextually relevant information. Many preprocessing methods exist to deliver one or more of these benefits, but which method or combination of methods should be used for the specific data being analyzed is difficult to select. Recently, we have shown that a preprocessing selection approach based on Design of Experiments (DoE) enables correct selection of highly appropriate preprocessing strategies within reasonable time frames. In that approach, the focus was solely on improving the predictive performance of the chemometric model. This is, however, only one of the two relevant criteria in modeling: interpretation of the model results can be just as important. Variable selection is often used to achieve such interpretation. Data artifacts, however, may hamper proper variable selection by masking the true relevant variables. The choice of preprocessing therefore has a huge impact on the outcome of variable selection methods and may thus hamper an objective interpretation of the final model. To enhance such objective interpretation, we here integrate variable selection into the preprocessing selection approach that is based on DoE. We show that the entanglement of preprocessing selection and variable selection not only improves the interpretation, but also the predictive performance of the model. This is achieved by analyzing several experimental data sets of which the true relevant variables are available as prior knowledge. We show that a selection of variables is provided that complies more with the true informative variables compared to individual optimization of both model aspects. Importantly, the approach presented in this work is generic. Different types of models (e.g. PCR, PLS,...) can be incorporated into it, as well as different variable selection methods and different preprocessing methods, according to the taste and experience of the user. In this work, the approach is illustrated by using PLS as model and PPRV-FCAM (Predictive Property Ranked Variable using Final Complexity Adapted Models) for variable selection. (C) 2016 The Authors. Published by Elsevier B.V.

Boosting model performance and interpretation by entangling preprocessing selection and variable selection

期刊

ANALYTICA CHIMICA ACTA

出版社

ELSEVIER SCIENCE BV

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Boosting model performance and interpretation by entangling preprocessing selection and variable selection

期刊

ANALYTICA CHIMICA ACTA

出版社

ELSEVIER SCIENCE BV

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文