4.5 Article

Using Generalized Correlation to Effect Variable Selection in Very High Dimensional Problems

期刊

出版社

TAYLOR & FRANCIS INC
DOI: 10.1198/jcgs.2009.08041

关键词

Bootstrap; Classification; Errors in variables; Generalized correlation; Hidden explanatory variables; Instrumental variables; Linear model; Measurement error; Regression

资金

  1. Direct For Mathematical & Physical Scien
  2. Division Of Mathematical Sciences [0906795] Funding Source: National Science Foundation

向作者/读者索取更多资源

Using the traditional linear model to implement variable selection can perform very effectively in some cases, provided the response to relevant components is approximately monotone and its gradient changes only slowly. In other circumstances, nonlinearity of response can result in significant vector components being overlooked. Even if good results are obtained by linear model fitting, they can sometimes be bettered by using a nonlinear approach. These circumstances can arise in practice, with real data, and they motivate alternative methodologies. We suggest an approach based on ranking generalized empirical correlations between the response variable and components of the explanatory vector. This technique is not prediction-based, and can identify variables that are influential but not explicitly part of a predictive model. We explore the method's performance for real and simulated data, and give a theoretical argument demonstrating its validity. The method can also be used in conjunction with, rather than as an alternative to, conventional prediction-based variable selections, by providing a preliminary massive dimension reduction step as a prelude to using alternative techniques (e.g., the adaptive lasso) that do not always cope well with very high dimensions. Supplemental materials relating to the numerical sections of this paper are available online.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据