4.7 Article

Efficient algorithms for mining closed and maximal high utility itemsets

Journal

KNOWLEDGE-BASED SYSTEMS
Volume 257, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.knosys.2022.109921

Keywords

Utility mining; High utility itemset; Closed high utility itemset; Maximal high utility itemset; Upper bound; Weak upper bound; Pruning strategy

Ask authors/readers for more resources

Closed high utility itemsets (CHUIs) and maximal high utility itemsets (MaxHUIs) are important concise representations of HUIs. Mining these representations is crucial for generating meaningful high utility association rules. However, existing algorithms suffer from long runtimes, high memory usage, and scalability issues. To address this, this paper proposes two efficient algorithms that can mine these representations faster.
Closed high utility itemsets (CHUIs) and maximal high utility itemsets (MaxHUIs) are two important concise representations of HUIs. Discovering these itemsets is important because they are lossless and compact, i.e., they provide a concise summary of all HUIs that can be orders of magnitude smaller. In addition, it can be more efficient to extract these representations than it would be to extract all HUIs. Mining the concise representations of HUIs is also an important step toward the generation of nonredundant high utility association rules that can reveal meaningful information to decision-makers. However, although several algorithms have been designed to mine these representations, such as EFIM-Closed, HMiner-Closed, and CHUI-Miner(Max), they have long runtimes, high memory usage, and scalability issues, especially for dense and large datasets. To address this issue, this paper proposes two efficient algorithms named C-HUIM and MaxC-HUIM for mining CHUIs, and simultaneously mining both CHUIs and MaxHUIs, respectively. These algorithms use a novel weak upper bound on the utility, which is strictly tighter than traditional upper bounds, and a corresponding pruning strategy called SPWUB to quickly eliminate low utility itemsets. The algorithms also include two novel search space reduction strategies named PSNonCHUB and LPSNonCHUB. The PSNonCHUB strategy only requires checking the inclusion relationship among a small number of itemsets, while LPSNonCHUB does not perform any inclusion check. In addition, the algorithms adopt a structure named MPUN-list to efficiently store and calculate information about each itemset's utility and support. Experimental results show that the proposed algorithms can be more than 100 times faster, are more memory efficient, and have better scalability than the state-of-the-art algorithms. (c) 2022 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available