4.5 Article

Generalized Support Vector Regression and Symmetry Functional Regression Approaches to Model the High-Dimensional Data

Journal

SYMMETRY-BASEL
Volume 15, Issue 6, Pages -

Publisher

MDPI
DOI: 10.3390/sym15061262

Keywords

functional regression; high-dimensional data; lasso regression; ridge regression; support vector regression

Ask authors/readers for more resources

Classical regression approaches are not suitable for analyzing high-dimensional datasets with more explanatory variables than observations, as the results can be misleading. In this study, we propose using modern techniques like support vector regression, symmetry functional regression, ridge, and lasso regression methods to analyze such data. We introduce a generalized support vector regression approach that improves the performance of support vector regression by accurately estimating the penalty parameter using cross-validation. We evaluate the efficiency of the proposed estimators based on three criteria and apply them to real and simulated high-dimensional datasets.
The analysis of the high-dimensional dataset when the number of explanatory variables is greater than the observations using classical regression approaches is not applicable and the results may be misleading. In this research, we proposed to analyze such data by introducing modern and up-to-date techniques such as support vector regression, symmetry functional regression, ridge, and lasso regression methods. In this study, we developed the support vector regression approach called generalized support vector regression to provide more efficient shrinkage estimation and variable selection in high-dimensional datasets. The generalized support vector regression can improve the performance of the support vector regression by employing an accurate algorithm for obtaining the optimum value of the penalty parameter using a cross-validation score, which is an asymptotically unbiased feasible estimator of the risk function. In this regard, using the proposed methods to analyze two real high-dimensional datasets (yeast gene data and riboflavin data) and a simulated dataset, the most efficient model is determined based on three criteria (correlation squared, mean squared error, and mean absolute error percentage deviation) according to the type of datasets. On the basis of the above criteria, the efficiency of the proposed estimators is evaluated.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available