☆ 4.5 Article

CL-MAX: a clustering-based approximation algorithm for mining maximal frequent itemsets

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS (2021)

Journal

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS

Volume 12, Issue 2, Pages 365-383

Publisher

SPRINGER HEIDELBERG

DOI: 10.1007/s13042-020-01177-5

Keywords

Frequent itemset mining; Association rule mining; Partition-based method; MiniBatch K-means; Maximal frequent itemset mining

Funding

University of Tehran [30764/1/02]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The paper introduces an approximation algorithm for frequent itemset mining by converting the problem into a clustering problem, which significantly improves the algorithm's efficiency. Experimental results show that the proposed algorithm is almost always faster than existing deterministic algorithms while retaining up to 95% accuracy.

The problem of frequent itemset mining is one of the more important problems in data mining which has been extensively employed across a wide range of other relevant tasks such as market basket analysis in marketing, or text analysis in text mining applications. The majority of the deterministic frequent itemset mining algorithms which have been proposed in recent years use some sort or another of an optimal data structures to reduce the overall execution time of the algorithm. In this paper, however, we have tried instead to introduce an approximation algorithm which works by converting the problem into a clustering problem where similar transactions are grouped together. Each cluster centroid represents an itemset which may be assumed to be a candidate frequent itemsets. The validity of this assumption is simply verified by calculating the support count of these itemsets. Those who meet the min-support condition are considered to be an actual frequent itemset. As for the remaining itemsets, they are then passed to MAFIA which extract all maximal frequent itemsets therefrom. Experimentations made on several well-known and diverse datasets show that the proposed algorithm performs almost always faster, and in some cases up to 10 times faster, than the existing deterministic algorithms, and all this by retaining up to 95% of its accuracy.

CL-MAX: a clustering-based approximation algorithm for mining maximal frequent itemsets

Journal

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS

Publisher

SPRINGER HEIDELBERG

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

CL-MAX: a clustering-based approximation algorithm for mining maximal frequent itemsets

Journal

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS

Publisher

SPRINGER HEIDELBERG

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper