4.5 Article

SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming

期刊

JOURNAL OF SUPERCOMPUTING
卷 76, 期 10, 页码 7619-7634

出版社

SPRINGER
DOI: 10.1007/s11227-020-03190-5

关键词

Frequent itemset mining; Streaming data; Sliding window; Distributed; Spark Streaming

资金

  1. Natural Science Foundation of the universities in Anhui province [KJ2019A1274]

向作者/读者索取更多资源

Finding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed frequent itemset mining algorithm to meet the needs of massive streaming data processing. Apache Spark is a unified analytic engine for massive data processing which has been successfully used in many data mining fields. In this paper, we propose a distributed algorithm for mining frequent itemsets over massive streaming data named SWEclat. The algorithm uses sliding window to process streaming data and uses vertical data structure to store the dataset in the sliding window. This algorithm is implemented by Apache Spark and uses Spark RDD to store streaming data and dataset in vertical data format, so as to divide these RDDs into partitions for distributed processing. Experimental results show that SWEclat algorithm has good acceleration, parallel scalability and load balancing.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据