4.5 Article

Variable selection by permutation applied in support vector regression models

Journal

JOURNAL OF CHEMOMETRICS
Volume 36, Issue 10, Pages -

Publisher

WILEY
DOI: 10.1002/cem.3444

Keywords

chemometrics; noise subwindow permutation analysis; subwindow permutation analysis; support vector machine; variable selection

Funding

  1. Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior - Brasil (CAPES) [001]
  2. CNPq [4/2021, 310349/2021-4]
  3. FAPES [03/2021, 442/2021]

Ask authors/readers for more resources

This study demonstrates the potential of variable selection methods, SPA and NISPA, in overcoming the limitations of support vector regression (SVR) as a black box machine-learning method. The application of these methods in FTIR data of crude oil samples resulted in more accurate and parsimonious models, revealing the most important variables for building the SVR models.
Support vector regression (SVR) can be considered as a black box machine-learning method. Thus, identifying the cause/effect relationship and the synergism between the most important variables is a difficult task. This study demonstrates the potential of two variable selection methods by permutation-subwindow permutation analysis (SPA) and noise incorporated subwindow permutation analysis (NISPA)-to overcome this limitation. The application of these two variable selections in SVR is poorly explored in literature, mainly for regression problems. The algorithms were applied in FTIR (Fourier transform mid infrared spectroscopy) data of crude oil samples to estimate API gravity, kinematic viscosity at 50 degrees C, saturates, aromatics, resins, and asphaltene content. The results were compared to other variable selection methods. SPA and NISPA provided the most accurate models for kinematic viscosity, saturates, and aromatic content. The root-mean-squared percentage error of prediction (RMSPEP) of the SPA and NISPA were, respectively, 14.26% and 14.62% for kinematic viscosity, 4.7 wt% and 4.4 wt% for saturates content, and 3.4 wt% and 3.1 wt% for aromatic content. Regarding API prediction, despite obtaining similar accuracy to the other selection methods, SPA produced a more simplified model, using only 3.5% of the 3351 total variables, with RMSEP equal to 1.0 and R(2)p to 0.981. Therefore, SPA and NISPA, besides obtaining, in general, faster and more accurate and parsimonious models, revealed the most important variables for building the SVR models.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available