☆ 4.5 Article

Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling

STATISTICS IN MEDICINE (2021)

Journal

STATISTICS IN MEDICINE

Volume 40, Issue 2, Pages 369-381

Publisher

WILEY

DOI: 10.1002/sim.8779

Keywords

backward elimination; bootstrap; stability measures; subsampling; variable selection

Funding

FWF [I-2276-N33, I-4739-B]
DFG [RA 2347/8-1]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This study explores statistical models stability assessment methods, comparing the advantages and disadvantages of subsampling and bootstrapping for evaluating model stability. Results show that bootstrapping performs better in estimating bias and precision, but the latter requires independence among covariates, emphasizing the need for careful selection of evaluation methods in practice.

Statistical models are often fitted to obtain a concise description of the association of an outcome variable with some covariates. Even if background knowledge is available to guide preselection of covariates, stepwise variable selection is commonly applied to remove irrelevant ones. This practice may introduce additional variability and selection is rarely certain. However, these issues are often ignored and model stability is not questioned. Several resampling-based measures were proposed to describe model stability, including variable inclusion frequencies (VIFs), model selection frequencies, relative conditional bias (RCB), and root mean squared difference ratio (RMSDR). The latter two were recently proposed to assess bias and variance inflation induced by variable selection. Here, we study the consistency and accuracy of resampling estimates of these measures and the optimal choice of the resampling technique. In particular, we compare subsampling and bootstrapping for assessing stability of linear, logistic, and Cox models obtained by backward elimination in a simulation study. Moreover, we exemplify the estimation and interpretation of all suggested measures in a study on cardiovascular risk. The VIF and the model selection frequency are only consistently estimated in the subsampling approach. By contrast, the bootstrap is advantageous in terms of bias and precision for estimating the RCB as well as the RMSDR. Though, unbiased estimation of the latter quantity requires independence of covariates, which is rarely encountered in practice. Our study stresses the importance of addressing model stability after variable selection and shows how to cope with it.

Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling

Journal

STATISTICS IN MEDICINE

Publisher

WILEY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling

Journal

STATISTICS IN MEDICINE

Publisher

WILEY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper