4.5 Article

The impact of data difficulty factors on classification of imbalanced and concept drifting data streams

期刊

KNOWLEDGE AND INFORMATION SYSTEMS
卷 63, 期 6, 页码 1429-1469

出版社

SPRINGER LONDON LTD
DOI: 10.1007/s10115-021-01560-w

关键词

Class imbalance; Concept drift; Data difficulty factors; Drift categorization; Stream classification

资金

  1. PUT Institute of Computing Science Statutory Funds
  2. EPSRC [EP/R006660/1, EP/R006660/2]
  3. TAILOR - EU Horizon 2020 research and innovation program [952215]
  4. EPSRC [EP/R006660/1, EP/R006660/2] Funding Source: UKRI

向作者/读者索取更多资源

Class imbalance poses additional challenges when learning classifiers from concept drifting data streams. Existing work primarily focuses on addressing global imbalance ratio, while neglecting other data complexities. Independent research on static imbalanced data has emphasized the influential role of local data difficulty factors. Investigating the interactions between concept drifts and local data difficulty factors in concept drifting data streams is crucial, as revealed by our comprehensive study.
Class imbalance introduces additional challenges when learning classifiers from concept drifting data streams. Most existing work focuses on designing new algorithms for dealing with the global imbalance ratio and does not consider other data complexities. Independent research on static imbalanced data has highlighted the influential role of local data difficulty factors such as minority class decomposition and presence of unsafe types of examples. Despite often being present in real-world data, the interactions between concept drifts and local data difficulty factors have not been investigated in concept drifting data streams yet. We thoroughly study the impact of such interactions on drifting imbalanced streams. For this purpose, we put forward a new categorization of concept drifts for class imbalanced problems. Through comprehensive experiments with synthetic and real data streams, we study the influence of concept drifts, global class imbalance, local data difficulty factors, and their combinations, on predictions of representative online classifiers. Experimental results reveal the high influence of new considered factors and their local drifts, as well as differences in existing classifiers' reactions to such factors. Combinations of multiple factors are the most challenging for classifiers. Although existing classifiers are partially capable of coping with global class imbalance, new approaches are needed to address challenges posed by imbalanced data streams.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据