4.5 Article

Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables

期刊

HELIYON
卷 7, 期 6, 页码 -

出版社

CELL PRESS
DOI: 10.1016/j.heliyon.2021.e07356

关键词

Variable selection; Feature selection; Regression; Predictive accuracy; Interpretability; QSPR; QSAR

资金

  1. Japan Society for the Promotion of Science [JP19K15352]

向作者/读者索取更多资源

This study evaluated the prediction accuracy of models constructed using selected X and investigated the results of variable selection methods. The findings suggest that even when variables unrelated to y are selected, accurate models can be constructed by applying various regression analysis methods.
The selection of a descriptor, X, is crucial for improving the interpretation and prediction accuracy of a regression model. In this study, the prediction accuracy of models constructed using the selected X was determined and the results of variable selection, according to the number of selected X and number of selected variables that are unrelated to an objective variable, such as activities and properties (y), were investigated to evaluate the variable or feature selection methods. Variable selection methods include least absolute shrinkage and selection operator, genetic algorithm-based partial least squares, genetic algorithm-based support vector regression, and Boruta. Several regression analysis methods were used to test the prediction accuracy of the model constructed using the selected X. The characteristics of each variable selection method were analyzed using eight datasets. The results showed that even when variables unrelated to y were selected by variable selection and the number of unrelated variables was the same as the number of the original variables, a regression model with good accuracy, which ignores the influence of such noise variables, can be constructed by applying various regression analysis methods. Additionally, the variables related to y must not to be deleted. These findings provide a basis for improving the variable selection methods.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据