4.7 Article

Online Feature Screening for Data Streams With Concept Drift

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2022.3232752

关键词

Feature extraction; Adaptation models; Fading channels; Data models; Computational modeling; Uncertainty; Indexes; Concept drift; data stream mining; feature screening; feature selection; model adaptation

向作者/读者索取更多资源

This paper introduces online feature selection methods and compares them with traditional screening methods. The experiments show that online screening methods can handle modern datasets with streaming input, sparsity, and concept drift, and generate the same feature importance as their offline versions with faster speed and less storage requirements.
Screening feature selection methods are often used as a preprocessing step for reducing the number of variables before training a model. Traditional screening methods only focus on dealing with complete high dimensional datasets. However, modern datasets not only have higher dimensions and larger sample size, but also have properties such as streaming input, sparsity, and concept drift. Therefore a considerable number of online feature selection methods were introduced to handle these kinds of problems in recent years. Online screening methods are one of the categories of online feature selection methods. The methods that we propose in this paper are capable of handling all three situations mentioned above, in classification settings. Our experiments show that the proposed methods can generate the same feature importance as their offline versions with faster speed and less storage requirements. Furthermore, the results show that online screening methods with integrated model adaptation have a higher true feature detection rate than without model adaptation on data streams exhibiting concept drift. Among the three large real datasets that potentially have concept drift, online screening methods with model adaptation show advantages in either saving computation time and space, reducing model complexity, or improving prediction accuracy.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据