4.6 Article

A distributed frequent itemset mining algorithm using Spark for Big Data analytics

Publisher

SPRINGER
DOI: 10.1007/s10586-015-0477-1

Keywords

Distributed data mining algorithm; Frequent itemset mining; Big data; Spark

Funding

  1. Scientific Research Projects of the NSFC [61173015]
  2. Fundamental Research Funds for the Central Universities

Ask authors/readers for more resources

Frequent itemset mining is an essential step in the process of association rule mining. Conventional approaches for mining frequent itemsets in big data era encounter significant challenges when computing power and memory space are limited. This paper proposes an efficient distributed frequent itemset mining algorithm (DFIMA) which can significantly reduce the amount of candidate itemsets by applying a matrix-based pruning approach. The proposed algorithm has been implemented using Spark to further improve the efficiency of iterative computation. Numeric experiment results using standard benchmark datasets by comparing the proposed algorithm with the existing algorithm, parallel FP-growth, show that DFIMA has better efficiency and scalability. In addition, a case study has been carried out to validate the feasibility of DFIMA.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available