4.5 Article

CL-MAX: a clustering-based approximation algorithm for mining maximal frequent itemsets

Journal

Publisher

SPRINGER HEIDELBERG
DOI: 10.1007/s13042-020-01177-5

Keywords

Frequent itemset mining; Association rule mining; Partition-based method; MiniBatch K-means; Maximal frequent itemset mining

Funding

  1. University of Tehran [30764/1/02]

Ask authors/readers for more resources

The paper introduces an approximation algorithm for frequent itemset mining by converting the problem into a clustering problem, which significantly improves the algorithm's efficiency. Experimental results show that the proposed algorithm is almost always faster than existing deterministic algorithms while retaining up to 95% accuracy.
The problem of frequent itemset mining is one of the more important problems in data mining which has been extensively employed across a wide range of other relevant tasks such as market basket analysis in marketing, or text analysis in text mining applications. The majority of the deterministic frequent itemset mining algorithms which have been proposed in recent years use some sort or another of an optimal data structures to reduce the overall execution time of the algorithm. In this paper, however, we have tried instead to introduce an approximation algorithm which works by converting the problem into a clustering problem where similar transactions are grouped together. Each cluster centroid represents an itemset which may be assumed to be a candidate frequent itemsets. The validity of this assumption is simply verified by calculating the support count of these itemsets. Those who meet the min-support condition are considered to be an actual frequent itemset. As for the remaining itemsets, they are then passed to MAFIA which extract all maximal frequent itemsets therefrom. Experimentations made on several well-known and diverse datasets show that the proposed algorithm performs almost always faster, and in some cases up to 10 times faster, than the existing deterministic algorithms, and all this by retaining up to 95% of its accuracy.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available