4.5 Article

A scalable and effective rough set theory-based approach for big data pre-processing

期刊

KNOWLEDGE AND INFORMATION SYSTEMS
卷 62, 期 8, 页码 3321-3386

出版社

SPRINGER LONDON LTD
DOI: 10.1007/s10115-020-01467-y

关键词

Big data; Data pre-processing; Rough set theory; Distributed processing; Scalability; High-performance computing

资金

  1. European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Grant [702527]
  2. Marie Curie Actions (MSCA) [702527] Funding Source: Marie Curie Actions (MSCA)

向作者/读者索取更多资源

A big challenge in the knowledge discovery process is to perform data pre-processing, specifically feature selection, on a large amount of data and high dimensional attribute set. A variety of techniques have been proposed in the literature to deal with this challenge with different degrees of success as most of these techniques need further information about the given input data for thresholding, need to specify noise levels or use some feature ranking procedures. To overcome these limitations, rough set theory (RST) can be used to discover the dependency within the data and reduce the number of attributes enclosed in an input data set while using the data alone and requiring no supplementary information. However, when it comes to massive data sets, RST reaches its limits as it is highly computationally expensive. In this paper, we propose a scalable and effective rough set theory-based approach for large-scale data pre-processing, specifically for feature selection, under the Spark framework. In our detailed experiments, data sets with up to 10,000 attributes have been considered, revealing that our proposed solution achieves a good speedup and performs its feature selection task well without sacrificing performance. Thus, making it relevant to big data.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据