4.7 Article

Trimming stability selection increases variable selection robustness

期刊

MACHINE LEARNING
卷 -, 期 -, 页码 -

出版社

SPRINGER
DOI: 10.1007/s10994-023-06384-z

关键词

Model selection; Robustness; Sparsity; Stability selection; Breakdown point

向作者/读者索取更多资源

Contamination can distort estimators, but robustness can address this issue. However, there is little discussion on the relationship between contamination and distorted variable selection in the literature. Many methods for sparse model selection, such as Stability Selection, have been proposed. We introduce the variable selection breakdown point to measure the number of contaminated cases or cells required to detect no relevant variables. By combining the variable selection breakdown point with resampling, we quantify the robustness of Stability Selection. Our trimmed Stability Selection method aggregates only the models with the best performance, reducing the impact of heavily contaminated resamples.
Contamination can severely distort an estimator unless the estimation procedure is suitably robust. This is a well-known issue and has been addressed in Robust Statistics, however, the relation of contamination and distorted variable selection has been rarely considered in the literature. As for variable selection, many methods for sparse model selection have been proposed, including the Stability Selection which is a meta-algorithm based on some variable selection algorithm in order to immunize against particular data configurations. We introduce the variable selection breakdown point that quantifies the number of cases resp. cells that have to be contaminated in order to let no relevant variable be detected. We show that particular outlier configurations can completely mislead model selection. We combine the variable selection breakdown point with resampling, resulting in the Stability Selection breakdown point that quantifies the robustness of Stability Selection. We propose a trimmed Stability Selection which only aggregates the models with the best performance so that, heuristically, models computed on heavily contaminated resamples should be trimmed away. An extensive simulation study with non-robust regression and classification algorithms as well as with two robust regression algorithms reveals both the potential of our approach to boost the model selection robustness as well as the fragility of variable selection using non-robust algorithms, even for an extremely small cell-wise contamination rate.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据