4.7 Article

The Optimal Machine Learning-Based Missing Data Imputation for the Cox Proportional Hazard Model

期刊

FRONTIERS IN PUBLIC HEALTH
卷 9, 期 -, 页码 -

出版社

FRONTIERS MEDIA SA
DOI: 10.3389/fpubh.2021.680054

关键词

machine learning; k-nearest neighbors imputation; random forest imputation; survival data simulation; cox proportional hazard model

向作者/读者索取更多资源

This study proposes four machine learning-based imputation strategies for survival data with different missing mechanisms, finding that non-parametric missForest is the only robust method under all missing mechanisms.
An adequate imputation of missing data would significantly preserve the statistical power and avoid erroneous conclusions. In the era of big data, machine learning is a great tool to infer the missing values. The root means square error (RMSE) and the proportion of falsely classified entries (PFC) are two standard statistics to evaluate imputation accuracy. However, the Cox proportional hazards model using various types requires deliberate study, and the validity under different missing mechanisms is unknown. In this research, we propose supervised and unsupervised imputations and examine four machine learning-based imputation strategies. We conducted a simulation study under various scenarios with several parameters, such as sample size, missing rate, and different missing mechanisms. The results revealed the type-I errors according to different imputation techniques in the survival data. The simulation results show that the non-parametric missForest based on the unsupervised imputation is the only robust method without inflated type-I errors under all missing mechanisms. In contrast, other methods are not valid to test when the missing pattern is informative. Statistical analysis, which is improperly conducted, with missing data may lead to erroneous conclusions. This research provides a clear guideline for a valid survival analysis using the Cox proportional hazard model with machine learning-based imputations.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据