4.7 Article

Class-overlap undersampling based on Schur decomposition for Class-imbalance problems

Journal

EXPERT SYSTEMS WITH APPLICATIONS
Volume 221, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2023.119735

Keywords

Schur decomposition; Class imbalance problems; Class-overlap; Positive and negative inertia index; Undersampling

Ask authors/readers for more resources

The class-imbalance problem is a significant issue in machine learning and data mining. Many methods have been developed to address this problem, but they usually overlook global similarity. To find the global similarity of datasets, a novel Schur decomposition class-overlap undersampling method (SDCU) is proposed. Experimental results demonstrate the superior performance of SDCU compared to other state-of-the-art methods on various classifiers.
The class-imbalance problem is an important area that plagues machine learning and data mining researchers. It is ubiquitous in all areas of the real world. At present, many methods have been developed to deal with the class -imbalance problem. For the class-imbalance problem, many researchers believe that the class distribution imbalance is not the main factor affecting the performance of the classification model. When the class distri-bution imbalance coexists with problems such as class overlap, small disjuncts, and noise, the model performance will be severely affected. For the problem of class-distribution imbalance and class-overlap, the existing methods mainly use nearest neighbors to obtain the local similarity of instances in the local domain, and find the over-lapping domains in the data set. To the best of our knowledge, no researchers have considered global similarity. In this regard, to find the global similarity of datasets, a novel Schur decomposition class-overlap undersampling method (SDCU) is proposed. SDCU attempts to obtain potentially overlapping instances on global similarity, and is the first to use matrix decomposition to deal with the problem of class-overlap on class-imbalanced data. We conduct comparative experiments on 46 publicly available real datasets. The experimental results show that when using AUC as the performance evaluation metric, the performance of SDCU has obvious advantages compared with other state-of-the-art methods on three different types of classifiers: SVM, CART, and 3NN. In addition, the test results of Friedman ranking and Holm's post-hoc test also confirmed the conclusions obtained by the experiments.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available