☆ 4.5 Article

SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming

JOURNAL OF SUPERCOMPUTING (2020)

期刊

JOURNAL OF SUPERCOMPUTING

卷 76, 期 10, 页码 7619-7634

出版社

SPRINGER

DOI: 10.1007/s11227-020-03190-5

关键词

Frequent itemset mining; Streaming data; Sliding window; Distributed; Spark Streaming

类别

Computer Science, Hardware & Architecture Computer Science, Theory & Methods Engineering, Electrical & Electronic

资金

Natural Science Foundation of the universities in Anhui province [KJ2019A1274]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Finding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed frequent itemset mining algorithm to meet the needs of massive streaming data processing. Apache Spark is a unified analytic engine for massive data processing which has been successfully used in many data mining fields. In this paper, we propose a distributed algorithm for mining frequent itemsets over massive streaming data named SWEclat. The algorithm uses sliding window to process streaming data and uses vertical data structure to store the dataset in the sliding window. This algorithm is implemented by Apache Spark and uses Spark RDD to store streaming data and dataset in vertical data format, so as to divide these RDDs into partitions for distributed processing. Experimental results show that SWEclat algorithm has good acceleration, parallel scalability and load balancing.

SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming

期刊

JOURNAL OF SUPERCOMPUTING

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming

期刊

JOURNAL OF SUPERCOMPUTING

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文