4.7 Article

Local Feature Selection for Large-Scale Data Sets With Limited Labels

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2022.3181208

关键词

Terms-Data mining; semi-supervised learning; local feature selection; rough set; related family

向作者/读者索取更多资源

This paper proposes a local feature selection method based on related family, which can accelerate data processing for large-scale data sets. The experiments demonstrate that the proposed algorithm is 405 times faster than LARD on partially labeled data sets while maintaining high classification accuracy. Additionally, this algorithm can effectively process partially labeled large-scale data sets with 5,000,000 samples or 20,000 features on a typical personal computer.
Processing large-scale data sets with limited labels has always been a difficult task in data mining. Facing this difficulty, two local feature selection algorithms, LARD and LRSD, have been proposed based on dependency degree, which can process partially labeled data sets and greatly improve the computational efficiency. However, it is very difficult for these algorithms to calculate large-scale data with millions of samples on a typical personal computer. Although the related family method is a more efficient approach than dependency degree, it cannot be used for partially labeled large-scale data. As a result, a local feature selection method based on related family is proposed to accelerate data processing in the paper. Experiments show that the proposed algorithm can run 405 times faster than LARD on partially labeled data sets and maintain high classification accuracy. In addition, this new algorithm can effectively process partially labeled large-scale data sets with 5,000,000 samples or 20,000 features on a typical personal computer.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据