☆ 4.6 Article

PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining

MULTIMEDIA SYSTEMS (2021)

Journal

MULTIMEDIA SYSTEMS

Volume 27, Issue 4, Pages 709-722

Publisher

SPRINGER

DOI: 10.1007/s00530-020-00725-x

Keywords

DiffNodeset structure; MapReduce; 2-Way comparison strategy; Load balancing strategy based on dynamic grouping; Frequent item mining

Funding

National Natural Science Foundation of China [41562019]
National Key Research and Development Program of China [2018YFC1504705]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This study presents an optimized parallel frequent itemset mining algorithm PFIMD based on MapReduce, addressing the time and space complexity as well as load balancing issues in existing parallel FIM algorithms. With the adoption of DiffNodeset structure and a 2-way comparison strategy, the algorithm's execution speed is accelerated. The algorithm is parallelized using the cloud computing platform Hadoop and the programming model MapReduce.

Frequent itemset mining (FIM) is a significant data mining technique which is widely adopted in numerous applications for exploring frequent items. With the rapid growth and expansion of datasets, FIM has become an interesting topic for many researchers, which has triggered many innovations of numerous FIM algorithms in the big data environment. This study aims to design an optimization parallel frequent itemset mining algorithm based on MapReduce, named as PFIMD algorithm, to deal with the problem of time and space complexity during processing and computing item sets, as well as the failure to adequately balance the load among parallel tasks in the existing parallel FIM algorithms. First, a structure called DiffNodeset is adopted for avoiding the increase of N-list cardinality in the MRPrePost algorithm effectively. Then, a 2-way comparison strategy is designed to speed up the DiffNodeset generation of 2-itemsets and reduce the time complexity of the algorithm. Finally, the steps of the improved algorithm are parallelized using the cloud computing platform Hadoop and the programming model MapReduce. Moreover, to achieve a uniform grouping of each item in F-list, a load balancing strategy based on dynamic grouping is proposed, which solves the problem of uneven load of each node in the cluster. The experimental results show that the modified algorithm not only overcomes the shortcoming of MRPrePost in the big data environment, but also greatly reduces the time and space complexity. Finally, the specific applications of PFIMD algorithm in several multimedia data sets are listed to illustrate its universality.

PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining

Journal

MULTIMEDIA SYSTEMS

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining

Journal

MULTIMEDIA SYSTEMS

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper