4.6 Article

Variable Selection Using Adaptive Nonlinear Interaction Structures in High Dimensions

Journal

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
Volume 105, Issue 492, Pages 1541-1553

Publisher

AMER STATISTICAL ASSOC
DOI: 10.1198/jasa.2010.tm10130

Keywords

Heredity structure; Interactions; Nonlinear regression; Regularization; Variable selection

Funding

  1. NSF [DMS-0705312, DMS-0906784]
  2. Division Of Mathematical Sciences
  3. Direct For Mathematical & Physical Scien [0906784] Funding Source: National Science Foundation

Ask authors/readers for more resources

Numerous penalization based methods have been proposed for fitting a traditional linear regression model in which the number of predictors, p, is large relative to the number of observations, n. Most of these approaches assume sparsity in the underlying coefficients and perform some form of variable selection. Recently, some of this work has been extended to nonlinear additive regression models. However, in many contexts one wishes to allow for the possibility of interactions among the predictors. This poses serious statistical and computational difficulties when p is large, as the number of candidate interaction terms is of order p(2). We introduce a new approach, Variable selection using Adaptive Nonlinear Interaction Structures in High dimensions (VANISH), that is based on a penalized least squares criterion and is designed for high dimensional nonlinear problems. Our criterion is convex and enforces the heredity constraint, in other words if an interaction term is added to the model, then the corresponding main effects are automatically included. We provide theoretical conditions under which VANISH will select the correct main effects and interactions. These conditions suggest that VANISH should outperform certain natural competitors when the true interaction structure is sufficiently sparse. Detailed simulation results are also provided, demonstrating that VANISH is computationally efficient and can be applied to nonlinear models involving thousands of terms while producing superior predictive performance over other approaches.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available