4.5 Article

Mining discriminative itemsets in data streams using the tilted-time window model

Journal

KNOWLEDGE AND INFORMATION SYSTEMS
Volume 63, Issue 5, Pages 1241-1270

Publisher

SPRINGER LONDON LTD
DOI: 10.1007/s10115-021-01550-y

Keywords

Data stream mining; Discriminative itemsets; Prefix tree; Tilted-time window model

Ask authors/readers for more resources

The importance of mining discriminative itemsets in data streams is discussed, along with a proposed method using a tilted-time window model. The efficient and high accuracy H-DISSparse algorithm is designed to address the challenges in discriminative itemset mining process, with dynamically adjusted data structures to improve performance.
A discriminative itemset is a frequent itemset in the target data stream with much higher frequency than that of the same itemset in the rest of the data streams in the dataset. The discriminative itemsets describe the distinguishing features between data streams. Mining discriminative itemsets in data streams is very important, where continuously arriving transactions can be inserted in fast speed and large volume. Compared with frequent itemset mining in single data stream, there are additional challenges in the discriminative itemset mining process as the Apriori property of subset is not applicable. We propose an efficient and high accurate method for mining discriminative itemsets in data streams using a tilted-time window model. The proposed single-pass H-DISSparse algorithm is designed particularly based on several well-defined characteristics aiming to improve the approximate frequencies of the itemsets in the tilted-time window model. The data structures are dynamically adjusted in offline time intervals to reflect the discriminative itemset frequencies in different time periods in unsynchronized data streams. Empirical analysis shows the efficient time and space complexity of the proposed method in the fast-growing big data streams.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available