4.7 Article

An extensive study of C-SMOTE, a Continuous Synthetic Minority Oversampling Technique for Evolving Data Streams

期刊

EXPERT SYSTEMS WITH APPLICATIONS
卷 196, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2022.116630

关键词

Evolving Data Stream; Streaming; Concept drift; Balancing

向作者/读者索取更多资源

This paper investigates binary classification in the presence of concept drift by rebalancing imbalanced data streams. The authors propose a pipeline based on C-SMOTE, which is combined with SML classification algorithms. Through experiments on synthetic and real data streams, the paper provides statistical evidence that using C-SMOTE pipelines can improve the performance of minority classes without significantly affecting the majority class performance.
Streaming Machine Learning (SML) studies algorithms that update their models, given an unbounded and often non-stationary flow of data performing a single pass. Online class imbalance learning is a branch of SML that combines the challenges of both class imbalance and concept drift. In this paper, we investigate the binary classification problem by rebalancing an imbalanced stream of data in the presence of concept drift, accessing one sample at a time. We propose an extensive comparative study of Continuous Synthetic Minority Oversampling Technique (C-SMOTE), inspired by the popular sampling technique SMOTE, as a meta-strategy to pipeline with SML classification algorithms. We benchmark C-SMOTE pipelines on both synthetic and real data streams, containing different types of concept drifts, different imbalance levels, and different class distributions. We bring statistical evidence that models learnt with C-SMOTE pipelines improve the minority class performance concerning both the baseline models and the state-of-the-art methods. We also perform a sensitivity analysis to detect the C-SMOTE impact on the majority class performance for the three types of concept drift and several class distributions. Moreover, we show a computational cost analysis in terms of time and memory consumption.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据