☆ 4.7 Article

Neighbourhood-based undersampling approach for handling imbalanced and overlapped data

INFORMATION SCIENCES (2020)

期刊

INFORMATION SCIENCES

卷 509, 期 -, 页码 47-70

出版社

ELSEVIER SCIENCE INC

DOI: 10.1016/j.ins.2019.08.062

关键词

Imbalanced dataset; Undersampling; k-NN; Class overlap; Classification

类别

Computer Science, Information Systems

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Class imbalanced datasets are common across different domains including health, security, banking and others. A typical supervised learning algorithm tends to be biased towards the majority class when dealing with imbalanced datasets. The learning task becomes more challenging when there is also an overlap of instances from different classes. In this paper, we propose an undersampling framework for handling class imbalance in binary datasets by removing potential overlapped data points. Our methods are designed to identify and eliminate majority class instances from the overlapping region. Accurate identification and elimination of these instances maximise the visibility of the minority class instances and at the same time minimises excessive elimination of data, which reduces information loss. Four methods based on neighbourhood searching with different criteria to identify potential overlapped instances are proposed in this paper. Extensive experiments using simulated and real-world datasets were carried out. Results show comparable performance with state-of-the-art methods across different common metrics with exceptional and statistically significant improvements in sensitivity. (C) 2019 Elsevier Inc. All rights reserved.

Neighbourhood-based undersampling approach for handling imbalanced and overlapped data

期刊

INFORMATION SCIENCES

出版社

ELSEVIER SCIENCE INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Neighbourhood-based undersampling approach for handling imbalanced and overlapped data

期刊

INFORMATION SCIENCES

出版社

ELSEVIER SCIENCE INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文