☆ 4.7 Article

A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors

INFORMATION SCIENCES (2021)

期刊

INFORMATION SCIENCES

卷 565, 期 -, 页码 438-455

出版社

ELSEVIER SCIENCE INC

DOI: 10.1016/j.ins.2021.03.041

关键词

Class-imbalance learning; Oversampling; Classification; Supervised learning; K nearest neighbors; Natural neighbors

类别

Computer Science, Information Systems

资金

National Natural Science Foundation of China [61802360]
Chongqing Education and Science Committee project [KJZH17104, CSTC2017rgunzdyfx0040]
Project of Chongqing Natural Science Foundation [cstc2019jcyjmsxmX0683]
Graduate Scientific Research and Innovation Foundation of Chongqing, China [CYB20063, CYB20049]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Class imbalance is a significant factor leading to performance deterioration in classifiers. Techniques such as SMOTE and its extension, NaNSMOTE, have been successful in addressing this issue and have been proven effective on real data sets.

An important cause leading to the performance deterioration of classifiers is class imbalance [1]. In class-imbalance problems, one or more classes (i.e., minority classes) have very few cases while other classes (i.e., majority classes) have large numbers of cases. Hence, class distribution is highly skewed in real-world applications of class-imbalance learning. Examples are biomedical analysis [2], fraud detection [3], enterprise credit evaluation [4], image recognition [5], etc. The prediction accuracy [6] is usually used to evaluate the performance of a trained classifier in machine learning. Developing techniques for the machine learning of a classifier from class-imbalanced data presents an important challenge. Among the existing methods for addressing this problem, SMOTE has been successful, has received great praise, and features an extensive range of practical applications. In this paper, we focus on SMOTE and its extensions, aiming to solve the most challenging issues, namely, the choice of the parameter k and the determination of the neighbor number of each sample. Hence, a synthetic minority oversampling technique with natural neighbors (NaNSMOTE) is proposed. In NaNSMOTE, the random difference between a selected base sample and one of its natural neighbors is used to generate synthetic samples. The main advantages of NaNSMOTE are that (a) it has an adaptive k value related to the data complexity; (b) samples of class centers have more neighbors to improve the generalization of synthetic samples, while border samples have fewer neighbors to reduce the error of synthetic samples; and (c) it can remove outliers. The effectiveness of NaNSMOTE is proven by comparing it with SMOTE and extended versions of SMOTE on real data sets. (c) 2021 Elsevier Inc. All rights reserved.

A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors

期刊

INFORMATION SCIENCES

出版社

ELSEVIER SCIENCE INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors

期刊

INFORMATION SCIENCES

出版社

ELSEVIER SCIENCE INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文