4.7 Article

An AUC-maximizing classifier for skewed and partially labeled data with an application in clinical prediction modeling

Journal

KNOWLEDGE-BASED SYSTEMS
Volume 278, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.knosys.2023.110831

Keywords

Proximal support vector machines; Imbalanced learning; Semi-supervised learning; AUC maximization learning; Clinical prediction modeling; COVID-19 prediction

Ask authors/readers for more resources

This paper proposes a novel classifier PSVM-AUCMax to address the problem of partially labeled and skewed datasets. PSVM-AUCMax focuses on improving prediction performance by maximizing the AUC. It has several merits, including enhanced generalization capability, simplified model selection process, and the same analytical solution as traditional PSVM.
Partially labeled and skewed datasets are common in many applications including healthcare, due to the high costs and time constraints of data collection and annotation. However, training machine learning classifiers on such data can undermine their prediction performances. In this paper, we propose a novel classifier to address this problem by focusing on the Area Under the Curve (AUC), which is widely recognized as a more robust performance metric for skewed datasets than other metrics such as accuracy and error rate. We introduce a new classifier called PSVM-AUC Maximizer (PSVM-AUCMax) which is based on Proximal Support Vector Machines (PSVM) and directly maximizes a new AUC-based metric in its learning objective. PSVM-AUCMax has several merits. First, by directly integrating the maximization of the proposed AUC-based metric, PSVM-AUCMax can be proved to have the enhanced generalization capability on the partially labeled and skewed dataset. Second, it simplifies the model selection process with fewer tuning hyperparameters. Third, PSVM-AUCMax's analytical solution remains the same form as traditional PSVM, preserving its advantages such as fast incremental updating in incremental learning scenarios. The efficacy of PSVM-AUCMax has been demonstrated through extensive experiments on several public datasets and a healthcare case study using data collected at the US Mayo Clinic. In the healthcare case study, we utilized PSVM-AUCMax to develop a clinical prediction model for forecasting composite outcomes in hospitalized COVID-19 patients which yielded promising results.& COPY; 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available