4.6 Article

A new hybrid approach based on genetic algorithm and support vector machine methods for hyperparameter optimization in synthetic minority over-sampling technique (SMOTE)

期刊

AIMS MATHEMATICS
卷 8, 期 4, 页码 9400-9415

出版社

AMER INST MATHEMATICAL SCIENCES-AIMS
DOI: 10.3934/math.2023473

关键词

SMOTE; genetic algorithm; support vector machine

向作者/读者索取更多资源

The crucial problem in applying classification algorithms is imbalanced classes. The SMOTE algorithm is developed to address the classification of imbalanced datasets. It increases the number of samples by interpolating between the minority samples. However, finding the best parameter values for the SMOTE algorithm is complicated.
The crucial problem when applying classification algorithms is unequal classes. An imbalanced dataset problem means, particularly in a two-class dataset, that the group variable of one class is comparatively more dominant than the group variable of the other class. The issue stems from the fact that the majority class dominates the minority class. The synthetic minority over -sampling technique (SMOTE) has been developed to deal with the classification of imbalanced datasets. SMOTE algorithm increases the number of samples by interpolating between the clustered minority samples. The SMOTE algorithm has three critical parameters, k, perc.over, and perc.under. perc.over and perc.under hyperparameters allow determining the minority and majority class ratios. The k parameter is the number of nearest neighbors used to create new minority class instances. Finding the best parameter value in the SMOTE algorithm is complicated. A hybridized version of genetic algorithm (GA) and support vector machine (SVM) approaches was suggested to address this issue for selecting SMOTE algorithm parameters. Three scenarios were created. Scenario 1 shows the evaluation of support vector machine SVM) results without using the SMOTE algorithm. Scenario 2 shows that the SVM was used after applying SMOTE algorithm without the GA algorithm. In the third scenario, the results were analyzed using the SVM algorithm after selecting the SMOTE algorithm's optimization method. This study used two imbalanced datasets, drug use and simulation data. After, the results were compared with model performance metrics. When the model performance metrics results are examined, the results of the third scenario reach the highest performance. As a result of this study, it has been shown that a genetic algorithm can optimize class ratios and k hyperparameters to improve the performance of the SMOTE algorithm.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据