4.5 Article

Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling

Journal

STATISTICS IN MEDICINE
Volume 40, Issue 2, Pages 369-381

Publisher

WILEY
DOI: 10.1002/sim.8779

Keywords

backward elimination; bootstrap; stability measures; subsampling; variable selection

Funding

  1. FWF [I-2276-N33, I-4739-B]
  2. DFG [RA 2347/8-1]

Ask authors/readers for more resources

This study explores statistical models stability assessment methods, comparing the advantages and disadvantages of subsampling and bootstrapping for evaluating model stability. Results show that bootstrapping performs better in estimating bias and precision, but the latter requires independence among covariates, emphasizing the need for careful selection of evaluation methods in practice.
Statistical models are often fitted to obtain a concise description of the association of an outcome variable with some covariates. Even if background knowledge is available to guide preselection of covariates, stepwise variable selection is commonly applied to remove irrelevant ones. This practice may introduce additional variability and selection is rarely certain. However, these issues are often ignored and model stability is not questioned. Several resampling-based measures were proposed to describe model stability, including variable inclusion frequencies (VIFs), model selection frequencies, relative conditional bias (RCB), and root mean squared difference ratio (RMSDR). The latter two were recently proposed to assess bias and variance inflation induced by variable selection. Here, we study the consistency and accuracy of resampling estimates of these measures and the optimal choice of the resampling technique. In particular, we compare subsampling and bootstrapping for assessing stability of linear, logistic, and Cox models obtained by backward elimination in a simulation study. Moreover, we exemplify the estimation and interpretation of all suggested measures in a study on cardiovascular risk. The VIF and the model selection frequency are only consistently estimated in the subsampling approach. By contrast, the bootstrap is advantageous in terms of bias and precision for estimating the RCB as well as the RMSDR. Though, unbiased estimation of the latter quantity requires independence of covariates, which is rarely encountered in practice. Our study stresses the importance of addressing model stability after variable selection and shows how to cope with it.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available