☆ 4.7 Article

A false negative approach to mining frequent itemsets from high speed transactional data streams

INFORMATION SCIENCES (2006)

期刊

INFORMATION SCIENCES

卷 176, 期 14, 页码 1986-2015

出版社

ELSEVIER SCIENCE INC

DOI: 10.1016/j.ins.2005.11.003

关键词

data stream; frequent pattern mining; memory minimization

类别

Computer Science, Information Systems

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Mining frequent itemsets from transactional data streams is challenging due to the nature of the exponential explosion of itemsets and the limit memory space required for mining frequent itemsets. Given a domain of I unique items, the possible number of itemsets can be up to 2(I) - 1. When the length of data streams approaches to a very large number N, the possibility of an itemset to be frequent becomes larger and difficult to track with limited memory. The existing studies on finding frequent items from high speed data streams are false-positive oriented. That is, they control memory consumption in the counting processes by an error parameter epsilon, and allow items with support below the specified minimum supports but above s - epsilon counted as frequent ones. However, such false-positive oriented approaches cannot be effectively applied to frequent itemsets mining for two reasons. First, false-positive items found increase the number of false-positive frequent itemsets exponentially. Second, minimization of the number of false-positive items found, by using a small 6, will make memory consumption large. Therefore, such approaches may make the problem computationally intractable with bounded memory consumption. In this paper, we developed algorithms that call effectively mine frequent item(set)s from high speed transactional data streams with a bound of memory consumption. Our algorithms are based on Chernoff bound in which we use a running error parameter to prune itein(set)s and use a reliability parameter to control memory. While our algorithms are false-negative oriented, that is, certain frequent itemsets may not appear in the results, the number of false-negative itemsets can be controlled by a predefined parameter so that desired recall rate of frequent itemsets can be guaranteed. Our extensive experimental studies show that the proposed algorithms have high accuracy, require less memory, and consume less CPU time. They significantly outperform the existing false-positive algorithms. (C) 2005 Elsevier Inc. All rights reserved.

A false negative approach to mining frequent itemsets from high speed transactional data streams

期刊

INFORMATION SCIENCES

出版社

ELSEVIER SCIENCE INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A false negative approach to mining frequent itemsets from high speed transactional data streams

期刊

INFORMATION SCIENCES

出版社

ELSEVIER SCIENCE INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文