4.5 Article

Generalized Support Vector Regression and Symmetry Functional Regression Approaches to Model the High-Dimensional Data

期刊

SYMMETRY-BASEL
卷 15, 期 6, 页码 -

出版社

MDPI
DOI: 10.3390/sym15061262

关键词

functional regression; high-dimensional data; lasso regression; ridge regression; support vector regression

向作者/读者索取更多资源

Classical regression approaches are not suitable for analyzing high-dimensional datasets with more explanatory variables than observations, as the results can be misleading. In this study, we propose using modern techniques like support vector regression, symmetry functional regression, ridge, and lasso regression methods to analyze such data. We introduce a generalized support vector regression approach that improves the performance of support vector regression by accurately estimating the penalty parameter using cross-validation. We evaluate the efficiency of the proposed estimators based on three criteria and apply them to real and simulated high-dimensional datasets.
The analysis of the high-dimensional dataset when the number of explanatory variables is greater than the observations using classical regression approaches is not applicable and the results may be misleading. In this research, we proposed to analyze such data by introducing modern and up-to-date techniques such as support vector regression, symmetry functional regression, ridge, and lasso regression methods. In this study, we developed the support vector regression approach called generalized support vector regression to provide more efficient shrinkage estimation and variable selection in high-dimensional datasets. The generalized support vector regression can improve the performance of the support vector regression by employing an accurate algorithm for obtaining the optimum value of the penalty parameter using a cross-validation score, which is an asymptotically unbiased feasible estimator of the risk function. In this regard, using the proposed methods to analyze two real high-dimensional datasets (yeast gene data and riboflavin data) and a simulated dataset, the most efficient model is determined based on three criteria (correlation squared, mean squared error, and mean absolute error percentage deviation) according to the type of datasets. On the basis of the above criteria, the efficiency of the proposed estimators is evaluated.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据