4.5 Article

Variable selection and Bayesian model averaging in case-control studies

Covariate and confounder selection in case-control studies is often carried out using a statistical variable selection method, such as a two-step method or a stepwise method in logistic regression. Inference is then carried out conditionally on the selected model, but this ignores the model uncertainty implicit in the variable selection process, and so may underestimate uncertainty about relative risks. We report on a simulation study designed to be similar to actual case-control studies. This shows that p-values computed after variable selection can greatly overstate the strength of conclusions. For example, for our simulated case-control studies with 1000 subjects, of variables declared to be 'significant' with p-values between 0.01 and 0.05, only 49 per cent actually were risk factors when stepwise variable selection was used. We propose Bayesian model averaging as a formal way of taking account of model uncertainty in case-control studies. This yields an easily interpreted summary, the posterior probability that a variable is a risk factor, and our simulation study indicates this to be reasonably well calibrated in the situations simulated. The methods are applied and compared in the context of a case-control study of cervical cancer. Copyright (C) 2001 John Wiley & Sons, Ltd.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据