4.6 Article

Performance of logistic regression modeling: beyond the number of events per variable, the role of data structure

期刊

JOURNAL OF CLINICAL EPIDEMIOLOGY
卷 64, 期 9, 页码 993-1000

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.jclinepi.2010.11.012

关键词

Model adequacy; Model building; Type I error; Power; Event per variable; Logistic regression

向作者/读者索取更多资源

Objective: Logistic regression is commonly used in health research, and it is important to be sure that the parameter estimates can be trusted. A common problem occurs when the outcome has few events; in such a case, parameter estimates may be biased or unreliable. This study examined the relation between correctness of estimation and several data characteristics: number of events per variable (EPV), number of predictors, percentage of predictors that are highly correlated, percentage of predictors that were non-null, size of regression coefficients, and size of correlations. Study Design: Simulation studies. Results: In many situations, logistic regression modeling may pose substantial problems even if the number of EPV exceeds 10. Moreover, the number of EPV is not the only element that impacts on the correctness of parameter estimation. High regression coefficients and high correlations between the predictors may cause large problems in the estimation process. Finally, power is generally very low, even at 20 EPV. Conclusion: There is no single rule based on EPV that would guarantee an accurate estimation of logistic regression parameters. Instead, the number of predictors, probable size of the regression coefficients based on previous literature, and correlations among the predictors must be taken into account as guidelines to determine the necessary sample size. (C) 2011 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据