4.7 Article

An AUC-maximizing classifier for skewed and partially labeled data with an application in clinical prediction modeling

期刊

KNOWLEDGE-BASED SYSTEMS
卷 278, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.knosys.2023.110831

关键词

Proximal support vector machines; Imbalanced learning; Semi-supervised learning; AUC maximization learning; Clinical prediction modeling; COVID-19 prediction

向作者/读者索取更多资源

This paper proposes a novel classifier PSVM-AUCMax to address the problem of partially labeled and skewed datasets. PSVM-AUCMax focuses on improving prediction performance by maximizing the AUC. It has several merits, including enhanced generalization capability, simplified model selection process, and the same analytical solution as traditional PSVM.
Partially labeled and skewed datasets are common in many applications including healthcare, due to the high costs and time constraints of data collection and annotation. However, training machine learning classifiers on such data can undermine their prediction performances. In this paper, we propose a novel classifier to address this problem by focusing on the Area Under the Curve (AUC), which is widely recognized as a more robust performance metric for skewed datasets than other metrics such as accuracy and error rate. We introduce a new classifier called PSVM-AUC Maximizer (PSVM-AUCMax) which is based on Proximal Support Vector Machines (PSVM) and directly maximizes a new AUC-based metric in its learning objective. PSVM-AUCMax has several merits. First, by directly integrating the maximization of the proposed AUC-based metric, PSVM-AUCMax can be proved to have the enhanced generalization capability on the partially labeled and skewed dataset. Second, it simplifies the model selection process with fewer tuning hyperparameters. Third, PSVM-AUCMax's analytical solution remains the same form as traditional PSVM, preserving its advantages such as fast incremental updating in incremental learning scenarios. The efficacy of PSVM-AUCMax has been demonstrated through extensive experiments on several public datasets and a healthcare case study using data collected at the US Mayo Clinic. In the healthcare case study, we utilized PSVM-AUCMax to develop a clinical prediction model for forecasting composite outcomes in hospitalized COVID-19 patients which yielded promising results.& COPY; 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据