4.7 Article

TDMO: Dynamic multi-dimensional oversampling for exploring data distribution based on extreme gradient boosting learning

Related references

Note: Only part of the references are listed.
Article Computer Science, Artificial Intelligence

Optimal Entropy Genetic Fuzzy-C-Means SMOTE (OEGFCM-SMOTE)

Karim El Moutaouakil et al.

Summary: This paper proposes Optimal Entropy Genetic Fuzzy-C-Means SMOTE (OEGFCM-SMOTE), a method based on the original mathematical model, soft clustering, and evolutionary optimization, for handling the classification problems of unbalanced data sets. OEGFCM-SMOTE addresses the sensitivity issue of the Kmeans method by generating synthetic samples in safe regions based on Fuzzy-C-Means, and it selects optimal parameters based on entropy to minimize the noise. Experimental results demonstrate that OEGFCM-SMOTE consistently outperforms other popular oversampling methods in multiple performance measures.


Article Computer Science, Artificial Intelligence

Grouping-based Oversampling in Kernel Space for Imbalanced Data Classification

Jinjun Ren et al.

Summary: Class-imbalanced classification is a challenging problem where traditional classifiers exhibit bias towards majority classes and generate incorrect predictions. Existing algorithms struggle with the issue of class overlapping. This paper proposes a grouping scheme for minority class samples based on their possibilities of appearing in overlapping regions in the feature space. A new oversampling method is then proposed to generate samples far away from the overlapping region and rectify the decision boundary. An effective classification algorithm for imbalanced data is developed based on these techniques. Extensive experiments demonstrate the superiority of the proposed algorithm over seventeen benchmark algorithms, particularly on highly imbalanced datasets.


Article Computer Science, Artificial Intelligence

Discriminatory Label-specific Weights for Multi-label Learning with Missing Labels

Reshma Rastogi et al.

Summary: This article proposes a method for addressing the class imbalance problem in multi-label learning with missing labels. The method constructs a label weight matrix and utilizes discriminatory label weights and auxiliary label correlations to guide the completion of missing labels and learning of the multi-label classifier.


Article Medicine, General & Internal

A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast Cancer

Ricky Walsh et al.

Summary: Tools based on deep learning models have been developed to assist radiologists in diagnosing breast cancer from mammograms. However, the imbalance of malignant and benign samples in the training datasets can lead to biased models. This study evaluates different techniques to address this class imbalance issue and shows that they can counteract the bias towards the majority class. However, these techniques do not improve the model's performance in terms of AUC-ROC, except for the synthetic lesion generation approach.


Article Computer Science, Interdisciplinary Applications

A no-tardiness job shop scheduling problem with overtime consideration and the solution approaches

Shuangyuan Shi et al.

Summary: This paper proposes a no-tardiness job shop scheduling problem with overtime work consideration (NTJSSP-OW) to minimize the total earliness inventory and overtime work costs simultaneously. A mathematical model is formulated and a hybrid genetic algorithm with simulated annealing (GASA) is proposed to solve it. Comprehensive experiments and statistical tests show that GASA algorithm has faster convergence and better global search ability compared to other competing algorithms.


Article Computer Science, Artificial Intelligence

Benchmarking state-of-the-art imbalanced data learning approaches for credit scoring

Cuiqing Jiang et al.

Summary: This paper discusses the solutions for the class imbalance problem in credit scoring, compares the performance of traditional approaches and generative adversarial networks (GANs) in credit scoring, and provides some recommendations with the help of benchmark analysis.


Article Computer Science, Information Systems

DML-PL: Deep metric learning based pseudo-labeling framework for class imbalanced semi-supervised learning

Mi Yan et al.

Summary: This study proposes a deep metric learning-based pseudo-labeling (DML-PL) framework that addresses both class imbalance and insufficient labeled data problems. Through an iterative self-training strategy, a deep metric network is trained to learn compact feature representations of labeled and unlabeled data, generating reliable pseudo-labels and improving training accuracy.


Article Computer Science, Information Systems

CHSMOTE: Convex hull-based synthetic minority oversampling technique for alleviating the class imbalance problem

Xiaohan Yuan et al.

Summary: In this paper, a novel and simple convex hull-based SMOTE (CHSMOTE) algorithm is proposed to overcome the weaknesses of SMOTE and alleviate class imbalance problem. CHSMOTE selects the border minority samples as initial samples, identifies the synthesis area based on convex hull, and generates more effective samples by enlarging the generation range. Extensive experiments demonstrate the effectiveness and superiority of the proposed algorithm.


Review Computer Science, Information Systems

K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data

Abiodun M. Ikotun et al.

Summary: Advances in data collection techniques have enabled the accumulation of large quantities of data. The K-means algorithm, while popular, has challenges such as determining the number of clusters and detecting non-Euclidean shapes. Research efforts have been made to improve its performance and robustness.


Article Computer Science, Information Systems

Subspace-based minority oversampling for imbalance classification

Tianjun Li et al.

Summary: This paper proposes a new over-sampling method called Subspace-based Minority Over-Sampling (SMO) to address the class imbalance problem. The SMO approach extracts common and unique characteristics of each category of samples using subspace and achieves balanced data by over-sampling the common part and expanding the unique part. Experimental results show that SMO outperforms classical and newly designed over-sampling algorithms and can be used to generate simple images.


Article Chemistry, Multidisciplinary

A Powerful Predicting Model for Financial Statement Fraud Based on Optimized XGBoost Ensemble Learning Technique

Amal Al Ali et al.

Summary: This study aims to develop a better Financial Statement Fraud (FSF) detection model using data from publicly available financial statements of firms in the MENA region. An FSF model was developed using the XGBoost algorithm, which outperformed other algorithms in this study. The class imbalance in the dataset was addressed using the SMOTE algorithm. The optimized XGBoost algorithm achieved a final accuracy of 96.05% in detecting FSF.


Article Computer Science, Artificial Intelligence

Classification and prediction of spinal disease based on the SMOTE-RFE- XGBoost model

Biao Zhang et al.

Summary: The article proposes a SMOTE-RFE-XGBoost model, which uses the physical angle of human bone as a research index to predict spinal diseases. The model utilizes the SMOTE algorithm to handle category imbalance, and employs LASSO, tree-based feature selection, and recursive feature elimination for feature selection. Various classification models are used to classify the samples and rank the feature importance. The SMOTE-RFE-XGBoost model achieves the best classification performance with an accuracy of 97.56%, MSE of 0.1111, and F1 value of 0.8696. The indicators of lumbar slippage, cervical tilt, pelvic radius, and pelvic tilt are found to be more important.


Article Physics, Multidisciplinary

Relative Density-Based Intuitionistic Fuzzy SVM for Class Imbalance Learning

Cui Fu et al.

Summary: In this paper, a novel relative density-based intuitionistic fuzzy support vector machine (RIFSVM) algorithm is proposed for imbalanced learning in the presence of noise and outliers. The algorithm estimates the intuitionistic fuzzy numbers using the relative density and assigns different fuzzy values to majority class and minority class instances to improve classification performance.

ENTROPY (2023)

Article Computer Science, Artificial Intelligence

KNNOR: An oversampling technique for imbalanced datasets

Ashhadul Islam et al.

Summary: This study introduces an advanced algorithm called KNNOR to address class imbalance by studying the compactness and location of the minority class, identifying critical and safe areas for augmentation, and generating synthetic data points. Experimental results show that the proposed method consistently outperforms other state-of-the-art oversamplers on several common imbalanced datasets, making it easy to use and open source as a python library.


Article Computer Science, Artificial Intelligence

Gaussian Distribution Based Oversampling for Imbalanced Data Classification

Yuxi Xie et al.

Summary: In this study, a new data resampling technique called Gaussian Distribution based Oversampling (GDO) is proposed to handle imbalanced data for classification. Experimental results show that GDO outperforms other compared methods in terms of AUC, G-mean, and memory usage, with an increase in running time. The effectiveness of GDO is further validated in two real imbalanced data classification problems.


Article Computer Science, Information Systems

SMOTE-RkNN: A hybrid re-sampling method based on SMOTE and reverse k-nearest neighbors

Aimin Zhang et al.

Summary: In recent years, class imbalance learning has gained importance in machine learning. The Synthetic Minority Oversampling TEchnique (SMOTE) is a popular algorithm for addressing class imbalance, but it suffers from noise propagation. To overcome this, researchers have proposed various SMOTE variants. This paper introduces a robust and universal SMOTE hybrid variant algorithm called SMOTE-RkNN, which identifies noise using probability density instead of local neighborhood information.


Article Chemistry, Analytical

Comparing Sampling Strategies for Tackling Imbalanced Data in Human Activity Recognition

Fayez Alharbi et al.

Summary: Human activity recognition using wearable sensors is a popular research topic in machine learning, facing challenges such as class imbalance. Three hybrid sampling strategies are introduced to generate diverse synthetic samples and improve classification performance. The proposed methods show significant enhancement compared to baseline and constituent techniques in handling class imbalance.

SENSORS (2022)

Article Automation & Control Systems

Class-Imbalance Privacy-Preserving Federated Learning for Decentralized Fault Diagnosis With Biometric Authentication

Shixiang Lu et al.

Summary: This study proposes a class-imbalanced privacy-preserving federated learning framework for fault diagnosis of decentralized wind turbines. By utilizing biometric authentication and privacy-enhancing techniques, it achieves high potential privacy and security. The framework also integrates a gradient-based self-monitoring scheme to enhance the understanding of global information for class-imbalanced fault diagnosis.


Article Computer Science, Information Systems

An efficiency curve for evaluating imbalanced classifiers considering intrinsic data characteristics: Experimental analysis

Xiangrui Chao et al.

Summary: Balancing the accuracy rates of majority and minority classes is challenging in imbalanced classification. This study introduces a new criterion, an efficiency curve, to comprehensively evaluate imbalanced classifiers and analyzes the impact of imbalanced ratio and data characteristics on classifier efficiency.


Article Computer Science, Artificial Intelligence

Instance weighted SMOTE by indirectly exploring the data distribution

Aimin Zhang et al.

Summary: This study presents the instance weighted SMOTE (IW-SMOTE) algorithm, which improves the SMOTE algorithm by indirectly exploiting distribution data. It uses an UnderBagging-like undersampling ensemble algorithm to classify each training instance and acquire confusing information. Based on the confusing information, the algorithm accurately estimates the location information of each instance and handles noisy and borderline instances accordingly. The balanced instance set is then used to train multiple classifiers to verify the algorithm's generality and effectiveness.


Article Computer Science, Artificial Intelligence

Double-kernelized weighted broad learning system for imbalanced data

Wuxing Chen et al.

Summary: Broad Learning System (BLS) is a fast learning neural network that has shown good performance in various applications. However, conventional BLS has limitations in dealing with class imbalance and parameter tuning. To overcome these challenges, we propose a double-kernelized weighted broad learning system (DKWBLS) that improves feature representation and addresses class imbalance. Experimental results demonstrate the superiority of DKWBLS in handling imbalanced data.


Review Computer Science, Information Systems

Empirical Analysis of Ensemble Learning for Imbalanced Credit Scoring Datasets: A Systematic Review

Sudhansu R. Lenka et al.

Summary: Credit scoring analysis is of great importance to researchers and the financial industry, as it helps identify deserving applicants for loans with minimal risks. Developing an accurate credit scoring model is challenging due to class imbalance and irrelevant features. Recent research has shown that ensemble learning is superior in this field. This paper conducts a comprehensive comparative analysis of ensemble algorithms to improve oversampling and feature selection techniques. Three feature selection techniques, information gain, principal component analysis, and genetic algorithm, are used to identify relevant features. The experimental results demonstrate that the GA-based FS technique and CatBoost algorithm outperform other models in terms of accuracy, area under the curve, F1-score, Brier score, and Kolmogorov-Smirnov.


Article Automation & Control Systems

A comprehensive comparison among metaheuristics (MHs) for geohazard modeling using machine learning: Insights from a case study of landslide displacement prediction

Junwei Ma et al.

Summary: This study proposes a systematic framework combining k-fold cross-validation, metaheuristics, support vector regression, and statistical tests to improve the reliability and performance of geohazard modeling. By comparing different algorithms, the multiverse optimizer is identified as one of the best-performing algorithms.


Article Computer Science, Information Systems

A hybrid imbalanced classification model based on data density

Shengnan Shi et al.


Article Energy & Fuels

An improved selective ensemble learning approach in enabling load classification considering base classifier redundancy and class imbalance

Shiqian Wang et al.

Summary: This article presents an improved selective ensemble learning approach to handle class imbalance and base classifier redundancy in load classification. Experimental results show that the approach is effective for load classification tasks.


Article Computer Science, Information Systems

RN-SMOTE: Reduced Noise SMOTE based on DBSCAN for enhancing imbalanced data classification

Ahmed Arafa et al.

Summary: Machine learning classifiers often struggle with imbalanced datasets, which are common in real-world scenarios. This paper presents RN-SMOTE, a preprocessing technique that addresses imbalanced classification by oversampling the minority class, removing noise, and rebalancing the dataset. Experimental results demonstrate that RN-SMOTE significantly improves classifier performance compared to the original data and traditional oversampling techniques.


Article Computer Science, Theory & Methods

The use of generative adversarial networks to alleviate class imbalance in tabular data: a survey

Rick Sauber-Cole et al.

Summary: The existence of class imbalance in a dataset can bias the classifier towards majority classification. Generative Adversarial Networks (GANs) have been used to generate instances of the underrepresented class(es) to mitigate this issue. While most research focuses on their application in computer vision tasks, GANs are also being used for tabular data with traditional structured data types.


Article Computer Science, Information Systems

RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise

Baiyun Chen et al.

Summary: Imbalanced classification is an important task in supervised learning, and the proposed self-adaptive robust SMOTE, RSMOTE, addresses this issue effectively by introducing relative density to adaptively divide minority samples and reweigh the number of samples needed to be generated based on their chaotic level. Experimental results show that RSMOTE outperforms comparison methods across various metrics, indicating its superiority in handling imbalanced classification with label noise.


Article Computer Science, Information Systems

A Gaussian mixture model based virtual sample generation approach for small datasets in industrial processes

Ling Li et al.

Summary: This paper addresses the small dataset problem by developing an information expansion function and a Gaussian mixture model based virtual sample generation method, which improves modeling performance.


Article Computer Science, Artificial Intelligence

LoRAS: an oversampling approach for imbalanced datasets

Saptarshi Bej et al.

Summary: This article introduces a method, LoRAS, that overcomes the limitations of SMOTE oversampling technique, and through experiments, proves that LoRAS generates better machine learning models on imbalanced datasets, improving F1-Score and balanced accuracy. Compared to most SMOTE extensions, LoRAS achieves better results in generating classification models.


Article Computer Science, Information Systems

A Synthetic Minority Based on Probabilistic Distribution (SyMProD) Oversampling for Imbalanced Datasets

Intouch Kunakorntum et al.


Article Computer Science, Information Systems

Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE

Georgios Douzas et al.


Article Computer Science, Information Systems

Grouped SMOTE With Noise Filtering Mechanism for Classifying Imbalanced Data

Ke Cheng et al.


Article Computer Science, Information Systems

Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE

Georgios Douzas et al.


Article Computer Science, Artificial Intelligence

KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining

Isaac Triguero et al.

International Journal of Computational Intelligence Systems (2017)

Article Computer Science, Artificial Intelligence

PSO-based method for SVM classification on skewed data sets

Jair Cervantes et al.


Article Computer Science, Artificial Intelligence

Preprocessing unbalanced data using support vector machine

M. A. H. Farquad et al.