☆ 4.6 Article

Variable selection with ABC Bayesian forests

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY (2021)

期刊

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY

卷 83, 期 3, 页码 453-481

出版社

OXFORD UNIV PRESS

DOI: 10.1111/rssb.12423

关键词

approximate Bayesian computation; BART; consistency; spike‐ and‐ slab; variable selection

类别

Statistics & Probability

资金

James S. Kemper Foundation Faculty Research Fund at the University of Chicago Booth School of Business

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The study abandons the linear model framework and turns to tree-based methods for variable selection, proposing a Bayesian tree-based probabilistic method that shows consistency under certain conditions. Additionally, a new ABC sampling method based on data-splitting is introduced to achieve higher acceptance rates, successfully identifying variables with high marginal inclusion probabilities. This research provides a new avenue towards approximating the median probability model in non-parametric setups where the marginal likelihood is intractable.

Few problems in statistics are as perplexing as variable selection in the presence of very many redundant covariates. The variable selection problem is most familiar in parametric environments such as the linear model or additive variants thereof. In this work, we abandon the linear model framework, which can be quite detrimental when the covariates impact the outcome in a non-linear way, and turn to tree-based methods for variable selection. Such variable screening is traditionally done by pruning down large trees or by ranking variables based on some importance measure. Despite heavily used in practice, these ad hoc selection rules are not yet well understood from a theoretical point of view. In this work, we devise a Bayesian tree-based probabilistic method and show that it is consistent for variable selection when the regression surface is a smooth mix of p > n covariates. These results are the first model selection consistency results for Bayesian forest priors. Probabilistic assessment of variable importance is made feasible by a spike-and-slab wrapper around sum-of-trees priors. Sampling from posterior distributions over trees is inherently very difficult. As an alternative to Markov Chain Monte Carlo (MCMC), we propose approximate Bayesian computation (ABC) Bayesian forests, a new ABC sampling method based on data-splitting that achieves higher ABC acceptance rate. We show that the method is robust and successful at finding variables with high marginal inclusion probabilities. Our ABC algorithm provides a new avenue towards approximating the median probability model in non-parametric setups where the marginal likelihood is intractable.

Variable selection with ABC Bayesian forests

期刊

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Variable selection with ABC Bayesian forests

期刊

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文