☆ 4.5 Article

Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes

STATISTICS IN MEDICINE (2019)

期刊

STATISTICS IN MEDICINE

卷 38, 期 7, 页码 1276-1296

出版社

WILEY

DOI: 10.1002/sim.7992

关键词

binary and time-to-event outcomes; logistic and Cox regression; multivariable prediction model; pseudo R-squared; sample size; shrinkage

类别

Mathematical & Computational Biology Public, Environmental & Occupational Health Medical Informatics Medicine, Research & Experimental Statistics & Probability

资金

National Institute for Health Research School for Primary Care Research (NIHR SPCR)
Netherlands Organisation for Scientific Research [9120.8004, 918.10.615]
National Centre for Advancing Translational Sciences [UL1 TR002243]
NIHR Biomedical Research Centre, Oxford

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

When designing a study to develop a new prediction model with binary or time-to-event outcomes, researchers should ensure their sample size is adequate in terms of the number of participants (n) and outcome events (E) relative to the number of predictor parameters (p) considered for inclusion. We propose that the minimum values of n and E (and subsequently the minimum number of events per predictor parameter, EPP) should be calculated to meet the following three criteria: (i) small optimism in predictor effect estimates as defined by a global shrinkage factor of >= 0.9, (ii) small absolute difference of <= 0.05 in the model's apparent and adjusted Nagelkerke's R-2, and (iii) precise estimation of the overall risk in the population. Criteria (i) and (ii) aim to reduce overfitting conditional on a chosen p, and require prespecification of the model's anticipated Cox-Snell R-2, which we show can be obtained from previous studies. The values of n and E that meet all three criteria provides the minimum sample size required for model development. Upon application of our approach, a new diagnostic model for Chagas disease requires an EPP of at least 4.8 and a new prognostic model for recurrent venous thromboembolism requires an EPP of at least 23. This reinforces why rules of thumb (eg, 10 EPP) should be avoided. Researchers might additionally ensure the sample size gives precise estimates of key predictor effects; this is especially important when key categorical predictors have few events in some categories, as this may substantially increase the numbers required.

Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes

期刊

STATISTICS IN MEDICINE

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes

期刊

STATISTICS IN MEDICINE

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文