4.6 Article

Panning for gold: model-X' knockoffs for high dimensional controlled variable selection

出版社

WILEY
DOI: 10.1111/rssb.12265

关键词

False discovery rate; Generalized linear models; Genomewide association study; Knockoff filter; Logistic regression; Markov blanket; Testing for conditional independence in non-linear models

资金

  1. Office of Naval Research [N00014-16-1-2712]
  2. Math + X award from the Simons Foundation
  3. National Science Foundation Career award [DMS-1150318]
  4. National Institutes of Health [T32GM096982]
  5. Simons Foundation

向作者/读者索取更多资源

Many contemporary large-scale applications involve building interpretable models linking a large set of potential covariates to a response in a non-linear fashion, such as when the response is binary. Although this modelling problem has been extensively studied, it remains unclear how to control the fraction of false discoveries effectively even in high dimensional logistic regression, not to mention general high dimensional non-linear models. To address such a practical problem, we propose a new framework of model-X' knockoffs, which reads from a different perspective the knockoff procedure that was originally designed for controlling the false discovery rate in linear models. Whereas the knockoffs procedure is constrained to homoscedastic linear models with np, the key innovation here is that model-X knockoffs provide valid inference from finite samples in settings in which the conditional distribution of the response is arbitrary and completely unknown. Furthermore, this holds no matter the number of covariates. Correct inference in such a broad setting is achieved by constructing knockoff variables probabilistically instead of geometrically. To do this, our approach requires that the covariates are random (independent and identically distributed rows) with a distribution that is known, although we provide preliminary experimental evidence that our procedure is robust to unknown or estimated distributions. To our knowledge, no other procedure solves the controlled variable selection problem in such generality but, in the restricted settings where competitors exist, we demonstrate the superior power of knockoffs through simulations. Finally, we apply our procedure to data from a case-control study of Crohn's disease in the UK, making twice as many discoveries as the original analysis of the same data.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据