4.7 Article

ROSE: robust online self-adjusting ensemble for continual learning on imbalanced drifting data streams

期刊

MACHINE LEARNING
卷 111, 期 7, 页码 2561-2599

出版社

SPRINGER
DOI: 10.1007/s10994-022-06168-x

关键词

Data streams; Concept drift; Online learning; Continual learning; Imbalanced data

资金

  1. 2018 VCU Presidential Research Quest Fund
  2. Amazon AWS Machine Learning Research award

向作者/读者索取更多资源

This article introduces a novel online ensemble classifier called ROSE, which is capable of handling challenges in data streams such as concept drift and class imbalance. ROSE features online training of base classifiers, online detection of concept drift, sliding window per class to handle imbalance, and self-adjusting bagging. Experimental results demonstrate that ROSE performs well in various data stream mining tasks.
Data streams are potentially unbounded sequences of instances arriving over time to a classifier. Designing algorithms that are capable of dealing with massive, rapidly arriving information is one of the most dynamically developing areas of machine learning. Such learners must be able to deal with a phenomenon known as concept drift, where the data stream may be subject to various changes in its characteristics over time. Furthermore, distributions of classes may evolve over time, leading to a highly difficult non-stationary class imbalance. In this work we introduce Robust Online Self-Adjusting Ensemble (ROSE), a novel online ensemble classifier capable of dealing with all of the mentioned challenges. The main features of ROSE are: (1) online training of base classifiers on variable size random subsets of features; (2) online detection of concept drift and creation of a background ensemble for faster adaptation to changes; (3) sliding window per class to create skew-insensitive classifiers regardless of the current imbalance ratio; and (4) self-adjusting bagging to enhance the exposure of difficult instances from minority classes. The interplay among these features leads to an improved performance in various data stream mining benchmarks. An extensive experimental study comparing with 30 ensemble classifiers shows that ROSE is a robust and well-rounded classifier for drifting imbalanced data streams, especially under the presence of noise and class imbalance drift, while maintaining competitive time complexity and memory consumption. Results are supported by a thorough non-parametric statistical analysis.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据