☆ 4.5 Article

Variable importance evaluation with personalized odds ratio for machine learning model interpretability with applications to electronic health records-based mortality prediction

STATISTICS IN MEDICINE (2023)

期刊

STATISTICS IN MEDICINE

卷 42, 期 6, 页码 761-780

出版社

WILEY

DOI: 10.1002/sim.9642

关键词

electronic health records; interpretable machine learning; predictive modeling; variable importance

类别

Mathematical & Computational Biology Public, Environmental & Occupational Health Medical Informatics Medicine, Research & Experimental Statistics & Probability

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study investigates the interpretability and variable importance of machine learning models. A novel and computationally efficient evaluation framework called VIPOR is proposed. VIPOR is a model-agnostic method that can evaluate variable importance locally and globally using the concept of personalized odds ratio. The method groups predictors into different categories and ranks their importance based on different statistics. The proposed method is demonstrated using real-world electronic health records data and compared with other interpretation methods.

The interpretability of machine learning models, even though with an excellent prediction performance, remains a challenge in practical applications. The model interpretability and variable importance for well-performed supervised machine learning models are investigated in this study. With the commonly accepted concept of odds ratio (OR), we propose a novel and computationally efficient Variable Importance evaluation framework based on the Personalized Odds Ratio (VIPOR). It is a model-agnostic interpretation method that can be used to evaluate variable importance both locally and globally. Locally, the variable importance is quantified by the personalized odds ratio (POR), which can account for subject heterogeneity in machine learning. Globally, we utilize a hierarchical tree to group the predictors into five groups: completely positive, completely negative, positive dominated, negative dominated, and neutral groups. The relative importance of predictors within each group is ranked based on different statistics of PORs across subjects for different application purposes. For illustration, we apply the proposed VIPOR method to interpreting a multilayer perceptron (MLP) model, which aims to predict the mortality of subarachnoid hemorrhage (SAH) patients using real-world electronic health records (EHR) data. We compare the important variables derived from MLP with other machine learning models, including tree-based models and the L1-regularized logistic regression model. The top importance variables are consistently identified by VIPOR across different prediction models. Comparisons with existing interpretation methods are also conducted and discussed based on publicly available data sets.

Variable importance evaluation with personalized odds ratio for machine learning model interpretability with applications to electronic health records-based mortality prediction

期刊

STATISTICS IN MEDICINE

出版社

WILEY

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Variable importance evaluation with personalized odds ratio for machine learning model interpretability with applications to electronic health records-based mortality prediction

期刊

STATISTICS IN MEDICINE

出版社

WILEY

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文