4.7 Article

On the class overlap problem in imbalanced data classification

期刊

KNOWLEDGE-BASED SYSTEMS
卷 212, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.knosys.2020.106631

关键词

Imbalanced data; Class overlap; Classification; Evaluation metric; Benchmark

向作者/读者索取更多资源

This paper investigates the impact of class imbalance and class overlap on learning algorithm performance, with experimental results showing that class overlap has a greater negative impact on performance. In addition, existing approaches for handling imbalanced datasets are critically reviewed, emphasizing the importance of further research on addressing class overlap.
Class imbalance is an active research area in the machine learning community. However, existing and recent literature showed that class overlap had a higher negative impact on the performance of learning algorithms. This paper provides detailed critical discussion and objective evaluation of class overlap in the context of imbalanced data and its impact on classification accuracy. First, we present a thorough experimental comparison of class overlap and class imbalance. Unlike previous work, our experiment was carried out on the full scale of class overlap and an extreme range of class imbalance degrees. Second, we provide an in-depth critical technical review of existing approaches to handle imbalanced datasets. Existing solutions from selective literature are critically reviewed and categorised as class distribution-based and class overlap-based methods. Emerging techniques and the latest development in this area are also discussed in detail. Experimental results in this paper are consistent with existing literature and show clearly that the performance of the learning algorithm deteriorates across varying degrees of class overlap whereas class imbalance does not always have an effect. The review emphasises the need for further research towards handling class overlap in imbalanced datasets to effectively improve learning algorithms' performance. (C) 2020 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据