4.7 Article

Comprehensive new approaches for variable selection using ordered predictors selection

Journal

ANALYTICA CHIMICA ACTA
Volume 1075, Issue -, Pages 57-70

Publisher

ELSEVIER SCIENCE BV
DOI: 10.1016/j.aca.2019.05.039

Keywords

Multivariate regression; Feature selection; Chemometrics; Informative vector; Prediction power

Funding

  1. Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior -Brasil (CAPES) [001]
  2. Conselho Nacional de Desenvolvimento Cientifico e Tecnologico (CNPq) [310303/2015-0, 310503/2015-9]
  3. Fundacao de Amparo a Pesquisa do Estado de Minas Gerais (FAPEMIG)

Ask authors/readers for more resources

New strategies of ordered predictors selection (OPS) were developed in this work, making this method more versatile and expanding its worldwide use and applicability. OPS is a recognized method to select variables in multivariate regression and is used by analytical chemists and chemometrists. It shows high ability to improve the prediction of models after the selection of a few and important variables. At the core of OPS is sorting variables from informative vectors and systematically investigating the regression models to identify the most relevant set of variables by comparing the cross-validation parameters of the models. Nevertheless, the first version of the OPS method performs variable selection using only one informative vector at a time and is limited to just one variable selection run. Then, three new strategies were proposed. First, an automatic method was developed to perform variable selection using several informative vectors and their combinations. Second, the feedback OPS is presented, in this new strategy the pre-selected variables would return to a new selection. Last, a method to apply OPS in full array subdivisions called OPS intervals was established. Initially, the new strategies were applied in the six datasets used in the original OPS paper to compare the prediction performance with the new OPS algorithms. After that, twelve new datasets were used to test and compare the new OPS approaches with other variable selection methods, genetic algorithm (GA), the interval successive projections algorithm for PLS (iSPA), and recursive weighted partial least squares (rPLS). The new OPS approaches out-performed the first OPS version and the other variable selection methods. Results showed that in addition to greater predictive capacity, the accuracy in the selection of expected variables is highly superior with the new OPS approaches. Overall, the new OPS provided the best set of selected variables to build more predictive and interpretative regression models, proving to be efficient for variable selection in different types of datasets. (C) 2019 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available