☆ 4.6 Article

A distributed frequent itemset mining algorithm using Spark for Big Data analytics

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS (2015)

Journal

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS

Volume 18, Issue 4, Pages 1493-1501

Publisher

SPRINGER

DOI: 10.1007/s10586-015-0477-1

Keywords

Distributed data mining algorithm; Frequent itemset mining; Big data; Spark

Funding

Scientific Research Projects of the NSFC [61173015]
Fundamental Research Funds for the Central Universities

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Frequent itemset mining is an essential step in the process of association rule mining. Conventional approaches for mining frequent itemsets in big data era encounter significant challenges when computing power and memory space are limited. This paper proposes an efficient distributed frequent itemset mining algorithm (DFIMA) which can significantly reduce the amount of candidate itemsets by applying a matrix-based pruning approach. The proposed algorithm has been implemented using Spark to further improve the efficiency of iterative computation. Numeric experiment results using standard benchmark datasets by comparing the proposed algorithm with the existing algorithm, parallel FP-growth, show that DFIMA has better efficiency and scalability. In addition, a case study has been carried out to validate the feasibility of DFIMA.

A distributed frequent itemset mining algorithm using Spark for Big Data analytics

Journal

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A distributed frequent itemset mining algorithm using Spark for Big Data analytics

Journal

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper