☆ 4.5 Article

OUBoost: boosting based over and under sampling technique for handling imbalanced data

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS (2023)

期刊

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS

卷 14, 期 10, 页码 3393-3411

出版社

SPRINGER HEIDELBERG

DOI: 10.1007/s13042-023-01839-0

关键词

Imbalanced data classification; Class imbalanced problem; Over-sampling; Under-sampling; Imbalance ratio; Boosting

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Most real-world datasets are imbalanced, which leads to biased classifiers favoring the majority class. This paper proposes a new under-sampling technique called Peak clustering and a boosting-based algorithm named OUBoost, which combines Peak under-sampling with SMOTE over-sampling. OUBoost selects useful examples from the majority class and creates synthetic examples for the minority class. Experimental results on 30 imbalanced datasets demonstrate the improved prediction performance in the minority class using OUBoost. Time comparisons and statistical tests further analyze the proposed algorithm.

Most real-world datasets usually contain imbalanced data. Learning from datasets where the number of samples in one class (minority) is much smaller than in another class (majority) creates biased classifiers to the majority class. The overall prediction accuracy in imbalanced datasets is higher than 90%, while this accuracy is relatively lower for minority classes. In this paper, we first propose a new technique for under-sampling based on the Peak clustering method from the majority class on imbalanced datasets. We then propose a novel boosting-based algorithm for learning from imbalanced datasets, based on a combination of the proposed Peak under-sampling algorithm and over-sampling technique (SMOTE) in the boosting procedure, named OUBoost. In the proposed OUBoost algorithm, misclassified examples are not given equal weights. OUBoost selects useful examples from the majority class and creates synthetic examples for the minority class. In fact, it indirectly updates the weights of samples. We designed experiments using several evaluation metrics, such as Recall, MCC, Gmean, and F-score on 30 real-world imbalanced datasets. The results show improved prediction performance in the minority class in most used datasets using OUBoost. We further report time comparisons and statistical tests to analyze our proposed algorithm in more details.

OUBoost: boosting based over and under sampling technique for handling imbalanced data

期刊

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS

出版社

SPRINGER HEIDELBERG

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

OUBoost: boosting based over and under sampling technique for handling imbalanced data

期刊

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS

出版社

SPRINGER HEIDELBERG

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文