4.5 Article

An efficient algorithm for mining closed high utility itemsets over data streams with one dataset scan

Journal

KNOWLEDGE AND INFORMATION SYSTEMS
Volume 65, Issue 1, Pages 207-240

Publisher

SPRINGER LONDON LTD
DOI: 10.1007/s10115-022-01763-9

Keywords

Itemsets mining; Data streams mining; High utility itemsets; Closed high utility itemsets

Ask authors/readers for more resources

This paper proposes a new algorithm, CHUIDS_OSc, for mining closed high utility itemsets over data streams, which achieves mining with only one scan of the original dataset. It introduces a new utility-list structure for efficient construction and update of batch information, and applies effective pruning strategies to improve the efficiency of the closed itemsets mining process.
The high utility itemsets mining over data streams will produce many redundant itemsets. To remove redundant itemsets, the researchers proposed to mine the closed high utility itemsets, the number of which is much smaller than that of the complete high utility itemsets and the result is lossless. However, the existing closed high utility itemsets mining algorithm over data streams needs to scan the dataset twice, and this algorithm that requires multiple scans cannot meet the real-time processing requirements of the streaming environment. To solve the above problem, this paper proposed a new algorithm CHUIDS_OSc that only needs to scan the original dataset once to achieve mining closed high utility itemsets over data streams. A new utility-list structure is designed in CHUIDS_OSc, and this structure can quickly complete the construction and update of batch information without rescanning the original dataset. In addition, effective pruning strategies are applied to improve the closed itemsets mining process and eliminate potential low utility candidates. Experimental evaluations show the efficiency and feasibility of the algorithm for scanning and processing datasets. As far as the running time is concerned, it is better than the previously proposed closed high utility itemsets mining algorithms that require multiple scans over data streams.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available