☆ 4.5 Article

HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing

JOURNAL OF SUPERCOMPUTING (2017)

期刊

JOURNAL OF SUPERCOMPUTING

卷 73, 期 8, 页码 3652-3668

出版社

SPRINGER

DOI: 10.1007/s11227-017-1963-4

关键词

Frequent pattern mining; Big data; Apache Spark; Apriori algorithm

类别

Computer Science, Hardware & Architecture Computer Science, Theory & Methods Engineering, Electrical & Electronic

资金

Department of Computer Science & Engineering, Indian Institute of Technology (ISM), Dhanbad, India
Department of Computer Science and Engineering, Indian Institute of Technology (ISM), Dhanbad, India

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Frequent itemset mining is one of the data mining techniques applied to discover frequent patterns, used in prediction, association rule mining, classification, etc. Apriori algorithm is an iterative algorithm, which is used to find frequent itemsets from transactional dataset. It scans complete dataset in each iteration to generate the large frequent itemsets of different cardinality, which seems better for small data but not feasible for big data. The MapReduce framework provides the distributed environment to run the Apriori on big transactional data. However, MapReduce is not suitable for iterative process and declines the performance. We introduce a novel algorithm named Hybrid Frequent Itemset Mining (HFIM), which utilizes the vertical layout of dataset to solve the problem of scanning the dataset in each iteration. Vertical dataset carries information to find support of each itemsets. Moreover, we also include some enhancements to reduce number of candidate itemsets. The proposed algorithm is implemented over Spark framework, which incorporates the concept of resilient distributed datasets and performs in-memory processing to optimize the execution time of operation. We compare the performance of HFIM with another Spark-based implementation of Apriori algorithm for various datasets. Experimental results show that the HFIM performs better in terms of execution time and space consumption.

HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing

期刊

JOURNAL OF SUPERCOMPUTING

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing

期刊

JOURNAL OF SUPERCOMPUTING

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文