4.7 Article

A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors

期刊

INFORMATION SCIENCES
卷 565, 期 -, 页码 438-455

出版社

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2021.03.041

关键词

Class-imbalance learning; Oversampling; Classification; Supervised learning; K nearest neighbors; Natural neighbors

资金

  1. National Natural Science Foundation of China [61802360]
  2. Chongqing Education and Science Committee project [KJZH17104, CSTC2017rgunzdyfx0040]
  3. Project of Chongqing Natural Science Foundation [cstc2019jcyjmsxmX0683]
  4. Graduate Scientific Research and Innovation Foundation of Chongqing, China [CYB20063, CYB20049]

向作者/读者索取更多资源

Class imbalance is a significant factor leading to performance deterioration in classifiers. Techniques such as SMOTE and its extension, NaNSMOTE, have been successful in addressing this issue and have been proven effective on real data sets.
An important cause leading to the performance deterioration of classifiers is class imbalance [1]. In class-imbalance problems, one or more classes (i.e., minority classes) have very few cases while other classes (i.e., majority classes) have large numbers of cases. Hence, class distribution is highly skewed in real-world applications of class-imbalance learning. Examples are biomedical analysis [2], fraud detection [3], enterprise credit evaluation [4], image recognition [5], etc. The prediction accuracy [6] is usually used to evaluate the performance of a trained classifier in machine learning. Developing techniques for the machine learning of a classifier from class-imbalanced data presents an important challenge. Among the existing methods for addressing this problem, SMOTE has been successful, has received great praise, and features an extensive range of practical applications. In this paper, we focus on SMOTE and its extensions, aiming to solve the most challenging issues, namely, the choice of the parameter k and the determination of the neighbor number of each sample. Hence, a synthetic minority oversampling technique with natural neighbors (NaNSMOTE) is proposed. In NaNSMOTE, the random difference between a selected base sample and one of its natural neighbors is used to generate synthetic samples. The main advantages of NaNSMOTE are that (a) it has an adaptive k value related to the data complexity; (b) samples of class centers have more neighbors to improve the generalization of synthetic samples, while border samples have fewer neighbors to reduce the error of synthetic samples; and (c) it can remove outliers. The effectiveness of NaNSMOTE is proven by comparing it with SMOTE and extended versions of SMOTE on real data sets. (c) 2021 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据