☆ 4.5 Article

Mining top-k high-utility itemsets from a data stream under sliding window model

APPLIED INTELLIGENCE (2017)

Journal

APPLIED INTELLIGENCE

Volume 47, Issue 4, Pages 1240-1255

Publisher

SPRINGER

DOI: 10.1007/s10489-017-0939-7

Keywords

Data mining; Pattern mining; Utility mining; Data streams; Top-k high utility mining

Funding

Infosys Centre for Artificial Intelligence
Indraprastha Institute of Information Technology Delhi (IIIT-Delhi)
Visvesvaraya Ph.D scheme for Electronics and IT

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

High-utility itemset mining has gained significant attention in the past few years. It aims to find sets of items i.e. itemsets from a database with utility no less than a user defined threshold. The notion of utility provides more flexibility to an analyst to mine relevant itemsets. Nowadays, a continuous and unbounded stream of data is generated from web-clicks, transaction flow from retail stores, sensor networks, etc. Mining high-utility itemsets from a data stream is a challenging task as the incoming stream of data has to be processed on the fly with time and storage memory constraints. The number of high-utility itemsets depends on the user-defined threshold. A large number of itemsets can be generated at very low threshold values and vice versa. It can be a tedious task to set a threshold value to get a reasonable number of itemsets. Top-k high-utility itemset mining was coined to address this issue. k is the number of high-utility itemsets in the result set as defined by the user. In this paper, we propose a data structure and an efficient algorithm for mining top-k high-utility itemsets from a data stream. The algorithm has a single phase that does not generate any candidates, unlike many algorithms that work in two phases, i.e., candidate generation followed by candidates verification. We conduct extensive experiments on several real and synthetic datasets. Experimental results demonstrate that our proposed algorithm performs 20 to 80 times better on sparse datasets and 300 to 700 times on dense datasets than the state-of-the-art algorithm in terms of computation time. Furthermore, our proposed algorithm requires less memory compared to the state-of-the-art algorithm.

Mining top-k high-utility itemsets from a data stream under sliding window model

Journal

APPLIED INTELLIGENCE

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Mining top-k high-utility itemsets from a data stream under sliding window model

Journal

APPLIED INTELLIGENCE

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper