4.8 Article

Variable selection with false discovery rate control in deep neural networks

Journal

NATURE MACHINE INTELLIGENCE
Volume 3, Issue 5, Pages 426-433

Publisher

NATURE PORTFOLIO
DOI: 10.1038/s42256-021-00308-z

Keywords

-

Funding

  1. National Institutes of Health [R01GM120733]
  2. American Cancer Society [RSG-17-206-01-TBG]
  3. National Science Foundation [1925645]
  4. Office of Advanced Cyberinfrastructure (OAC)
  5. Direct For Computer & Info Scie & Enginr [1925645] Funding Source: National Science Foundation

Ask authors/readers for more resources

SurvNet is a backward elimination procedure for variable selection in deep neural networks, with the ability to estimate and control the false discovery rate of selected variables. It adaptively determines how many variables to eliminate at each step to maximize selection efficiency.
Deep neural networks are famous for their high prediction accuracy, but they are also known for their black-box nature and poor interpretability. We consider the problem of variable selection, that is, selecting the input variables that have significant predictive power on the output, in deep neural networks. Most existing variable selection methods for neural networks are only applicable to shallow networks or are computationally infeasible on large datasets; moreover, they lack a control on the quality of selected variables. Here we propose a backward elimination procedure called SurvNet, which is based on a new measure of variable importance that applies to a wide variety of networks. More importantly, SurvNet is able to estimate and control the false discovery rate of selected variables empirically. Further, SurvNet adaptively determines how many variables to eliminate at each step in order to maximize the selection efficiency. The validity and efficiency of SurvNet are shown on various simulated and real datasets, and its performance is compared with other methods. Especially, a systematic comparison with knockoff-based methods shows that although they have more rigorous false discovery rate control on data with strong variable correlation, SurvNet usually has higher power. Identifying salient input features can be a challenge in neural networks. The authors developed a variable selection procedure with false discovery rate control that works on classification or regression problems, one or multiple output neurons, and deep or shallow neural networks.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available