4.5 Article

OUBoost: boosting based over and under sampling technique for handling imbalanced data

期刊

出版社

SPRINGER HEIDELBERG
DOI: 10.1007/s13042-023-01839-0

关键词

Imbalanced data classification; Class imbalanced problem; Over-sampling; Under-sampling; Imbalance ratio; Boosting

向作者/读者索取更多资源

Most real-world datasets are imbalanced, which leads to biased classifiers favoring the majority class. This paper proposes a new under-sampling technique called Peak clustering and a boosting-based algorithm named OUBoost, which combines Peak under-sampling with SMOTE over-sampling. OUBoost selects useful examples from the majority class and creates synthetic examples for the minority class. Experimental results on 30 imbalanced datasets demonstrate the improved prediction performance in the minority class using OUBoost. Time comparisons and statistical tests further analyze the proposed algorithm.
Most real-world datasets usually contain imbalanced data. Learning from datasets where the number of samples in one class (minority) is much smaller than in another class (majority) creates biased classifiers to the majority class. The overall prediction accuracy in imbalanced datasets is higher than 90%, while this accuracy is relatively lower for minority classes. In this paper, we first propose a new technique for under-sampling based on the Peak clustering method from the majority class on imbalanced datasets. We then propose a novel boosting-based algorithm for learning from imbalanced datasets, based on a combination of the proposed Peak under-sampling algorithm and over-sampling technique (SMOTE) in the boosting procedure, named OUBoost. In the proposed OUBoost algorithm, misclassified examples are not given equal weights. OUBoost selects useful examples from the majority class and creates synthetic examples for the minority class. In fact, it indirectly updates the weights of samples. We designed experiments using several evaluation metrics, such as Recall, MCC, Gmean, and F-score on 30 real-world imbalanced datasets. The results show improved prediction performance in the minority class in most used datasets using OUBoost. We further report time comparisons and statistical tests to analyze our proposed algorithm in more details.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据