☆ 4.4 Article

Consequences of ignoring clustering in linear regression

BMC MEDICAL RESEARCH METHODOLOGY (2021)

期刊

BMC MEDICAL RESEARCH METHODOLOGY

卷 21, 期 1, 页码 -

出版社

BMC

DOI: 10.1186/s12874-021-01333-7

关键词

Clustering; Linear regression; Random intercept model; Consequences; Simulation; Comparison; Bias

类别

Health Care Sciences & Services

资金

Colt Foundation
Versus Arthritis [22090]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this study, it was found that failure to account for clustering in linear regression may lead to significantly erroneous conclusions, especially with continuous explanatory variables. The precision of effect estimates from the ordinary least squares (OLS) model was also found to be lower when the explanatory variable was more clustered.

Background Clustering of observations is a common phenomenon in epidemiological and clinical research. Previous studies have highlighted the importance of using multilevel analysis to account for such clustering, but in practice, methods ignoring clustering are often employed. We used simulated data to explore the circumstances in which failure to account for clustering in linear regression could lead to importantly erroneous conclusions. Methods We simulated data following the random-intercept model specification under different scenarios of clustering of a continuous outcome and a single continuous or binary explanatory variable. We fitted random-intercept (RI) and ordinary least squares (OLS) models and compared effect estimates with the true value that had been used in simulation. We also assessed the relative precision of effect estimates, and explored the extent to which coverage by 95% confidence intervals and Type I error rates were appropriate. Results We found that effect estimates from both types of regression model were on average unbiased. However, deviations from the true value were greater when the outcome variable was more clustered. For a continuous explanatory variable, they tended also to be greater for the OLS than the RI model, and when the explanatory variable was less clustered. The precision of effect estimates from the OLS model was overestimated when the explanatory variable varied more between than within clusters, and was somewhat underestimated when the explanatory variable was less clustered. The cluster-unadjusted model gave poor coverage rates by 95% confidence intervals and high Type I error rates when the explanatory variable was continuous. With a binary explanatory variable, coverage rates by 95% confidence intervals and Type I error rates deviated from nominal values when the outcome variable was more clustered, but the direction of the deviation varied according to the overall prevalence of the explanatory variable, and the extent to which it was clustered. Conclusions In this study we identified circumstances in which application of an OLS regression model to clustered data is more likely to mislead statistical inference. The potential for error is greatest when the explanatory variable is continuous, and the outcome variable more clustered (intraclass correlation coefficient is >= 0.01).

Consequences of ignoring clustering in linear regression

期刊

BMC MEDICAL RESEARCH METHODOLOGY

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Consequences of ignoring clustering in linear regression

期刊

BMC MEDICAL RESEARCH METHODOLOGY

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文