☆ 4.5 Article

Minimum sample size for developing a multivariable prediction model: Part I - Continuous outcomes

STATISTICS IN MEDICINE (2019)

期刊

STATISTICS IN MEDICINE

卷 38, 期 7, 页码 1262-1275

出版社

WILEY

DOI: 10.1002/sim.7993

关键词

continuous outcome; linear regression; minimum sample size; multivariable prediction model; R-squared

类别

Mathematical & Computational Biology Public, Environmental & Occupational Health Medical Informatics Medicine, Research & Experimental Statistics & Probability

资金

National Institute for Health Research School for Primary Care Research (NIHR SPCR)
Netherlands Organisation for Scientific Research [9120.8004, 918.10.615]
CTSA award from the National Center for Advancing Translational Sciences [UL1 TR002243]
NIHR Biomedical Research Centre, Oxford

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In the medical literature, hundreds of prediction models are being developed to predict health outcomes in individuals. For continuous outcomes, typically a linear regression model is developed to predict an individual's outcome value conditional on values of multiple predictors (covariates). To improve model development and reduce the potential for overfitting, a suitable sample size is required in terms of the number of subjects (n) relative to the number of predictor parameters (p) for potential inclusion. We propose that the minimum value of n should meet the following four key criteria: (i) small optimism in predictor effect estimates as defined by a global shrinkage factor of >= 0.9; (ii) small absolute difference of <= 0.05 in the apparent and adjusted R-2; (iii) precise estimation (a margin of error <= 10% of the true value) of the model's residual standard deviation; and similarly, (iv) precise estimation of the mean predicted outcome value (model intercept). The criteria require prespecification of the user's chosen p and the model's anticipated R-2 as informed by previous studies. The value of n that meets all four criteria provides the minimum sample size required for model development. In an applied example, a new model to predict lung function in African-American women using 25 predictor parameters requires at least 918 subjects to meet all criteria, corresponding to at least 36.7 subjects per predictor parameter. Even larger sample sizes may be needed to additionally ensure precise estimates of key predictor effects, especially when important categorical predictors have low prevalence in certain categories.

Minimum sample size for developing a multivariable prediction model: Part I - Continuous outcomes

期刊

STATISTICS IN MEDICINE

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Minimum sample size for developing a multivariable prediction model: Part I - Continuous outcomes

期刊

STATISTICS IN MEDICINE

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文