期刊
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
卷 35, 期 11, 页码 11693-11707出版社
IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2022.3232752
关键词
Feature extraction; Adaptation models; Fading channels; Data models; Computational modeling; Uncertainty; Indexes; Concept drift; data stream mining; feature screening; feature selection; model adaptation
This paper introduces online feature selection methods and compares them with traditional screening methods. The experiments show that online screening methods can handle modern datasets with streaming input, sparsity, and concept drift, and generate the same feature importance as their offline versions with faster speed and less storage requirements.
Screening feature selection methods are often used as a preprocessing step for reducing the number of variables before training a model. Traditional screening methods only focus on dealing with complete high dimensional datasets. However, modern datasets not only have higher dimensions and larger sample size, but also have properties such as streaming input, sparsity, and concept drift. Therefore a considerable number of online feature selection methods were introduced to handle these kinds of problems in recent years. Online screening methods are one of the categories of online feature selection methods. The methods that we propose in this paper are capable of handling all three situations mentioned above, in classification settings. Our experiments show that the proposed methods can generate the same feature importance as their offline versions with faster speed and less storage requirements. Furthermore, the results show that online screening methods with integrated model adaptation have a higher true feature detection rate than without model adaptation on data streams exhibiting concept drift. Among the three large real datasets that potentially have concept drift, online screening methods with model adaptation show advantages in either saving computation time and space, reducing model complexity, or improving prediction accuracy.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据