4.7 Article

Handling imbalanced data with concept drift by applying dynamic sampling and ensemble classification model

期刊

COMPUTER COMMUNICATIONS
卷 153, 期 -, 页码 553-560

出版社

ELSEVIER
DOI: 10.1016/j.comcom.2020.01.061

关键词

Imbalanced class distribution; Concept drifts; Sampling; Resampling; Optimal reservoir size; Ensemble classification

向作者/读者索取更多资源

With the availability of a broad range of applications for Big Data streaming, both the class imbalance and concept drift have become crucial learning issues. The concept of drift handling solutions is sensitive to class imbalance. The sampling techniques are widely applied to process the continuously arriving data streams with a sufficient number of instances. The selected instances have to build a statistical inference to support imbalanced class distribution. The stream data classification model without concept drift adaptation is not preferable to the imbalanced class distribution. To solve the issues, this article presents the dynamic sampling and an ensemble classification technique, named as Handling Imbalanced Data with Concept Drift (HIDC). To provide high statistical precision over imbalanced class distribution with concept drift, the HIDC decides an optimal reservoir size using the metrics regarding statistical properties of stream data and control parameter. The former refers to the inequality level in the values of instances arrived from a source, and the latter one controls over the selection of instances from multiple sources. The HIDC estimates the optimal reservoir size using such statistical and control parameters. To select the appropriate instances with an allocated optimal reservoir size, the HIDC applies random sampling over imbalanced classes and chooses a set of instances from multiple sources. The random sampling cannot solve the issues of imbalanced class distribution among the existing classes. To address such problems, the HIDC applies resampling techniques with respect to the imbalance factor. To identify and address the new concepts, the proposed HIDC sampling model trains the candidate classifier and replaces the worst ensemble member with the candidate classifier. Finally, the experimental results show that the HIDC performs better sampling and mining over imbalanced class distribution with concept drifts.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据