☆ 4.3 Article

Individual risk prediction: Comparing random forests with Cox proportional-hazards model by a simulation study

BIOMETRICAL JOURNAL (2023)

期刊

BIOMETRICAL JOURNAL

卷 65, 期 6, 页码 -

出版社

WILEY

DOI: 10.1002/bimj.202100380

关键词

Cox model; machine learning; random survival forest; survival analysis

类别

Mathematical & Computational Biology Statistics & Probability

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study systematically compares the performance of random forest (RF), random survival forest (RSF), and Cox proportional hazards (Cox-PH) model through a simulation study and real data analysis. The results show that RF generally performs worst, while the performance of RSF and Cox-PH varies depending on different scenarios and assumptions. Models considering survival time show better performance in real data analysis.

With big data becoming widely available in healthcare, machine learning algorithms such as random forest (RF) that ignores time-to-event information and random survival forest (RSF) that handles right-censored data are used for individual risk prediction alternatively to the Cox proportional hazards (Cox-PH) model. We aimed to systematically compare RF and RSF with Cox-PH. RSF with three split criteria [log-rank (RSF-LR), log-rank score (RSF-LRS), maximally selected rank statistics (RSF-MSR)]; RF, Cox-PH, and Cox-PH with splines (Cox-S) were evaluated through a simulation study based on real data. One hundred eighty scenarios were investigated assuming different associations between the predictors and the outcome (linear/linear and interactions/nonlinear/nonlinear and interactions), training sample sizes (500/1000/5000), censoring rates (50%/75%/93%), hazard functions (increasing/decreasing/constant), and number of predictors (seven, 15 including noise variables). Methods' performance was evaluated with time-dependent area under curve and integrated Brier score. In all scenarios, RF had the worst performance. In scenarios with a low number of events (<= 70), Cox-PH was at least noninferior to RSF, whereas under linearity assumption it outperformed RSF. Under the presence of interactions, RSF performed better than Cox-PH as the number of events increased whereas Cox-S reached at least similar performance with RSF under nonlinear effects. RSF-LRS performed slightly worse than RSF-LR and RSF-MSR when including noise variables and interaction effects. When applied to real data, models incorporating survival time performed better. Although RSF algorithms are a promising alternative to conventional Cox-PH as data complexity increases, they require a higher number of events for training. In time-to-event analysis, algorithms that consider survival time should be used.

Individual risk prediction: Comparing random forests with Cox proportional-hazards model by a simulation study

期刊

BIOMETRICAL JOURNAL

出版社

WILEY

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Individual risk prediction: Comparing random forests with Cox proportional-hazards model by a simulation study

期刊

BIOMETRICAL JOURNAL

出版社

WILEY

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文