4.7 Article

Efficient strategies for incremental mining of frequent closed itemsets over data streams

期刊

EXPERT SYSTEMS WITH APPLICATIONS
卷 191, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2021.116220

关键词

Data streams; Closed itemsets; Frequent itemsets; Data mining; Knowledge discovery

资金

  1. National Natural Science Foundation of China [62172357]
  2. Zhejiang Provincial Natural Science Foundation of China [LY17F020004]
  3. Key Projects of Zhejiang Science and Technology Plan of China [2018C01084]
  4. Zhejiang Provincial Key Laboratory of New Network Standards and Technologies, China [2013E10012]

向作者/读者索取更多资源

The paper introduces a novel algorithm for mining frequent closed itemsets over data streams with high efficiency and scalability. By introducing an indexed prefix closed itemset tree and novel search strategies, the algorithm outperforms existing algorithms in terms of efficiency and performance.
Mining frequent closed itemsets over data streams is an important data mining problem. Mining data streams is more challenging than mining static data because of the nature of data streams, including high arrival rate, massive volume of incoming data, and concept drift. The existing algorithms for mining frequent closed itemsets over data streams suffer from scalability and efficiency bottlenecks. This paper proposes a novel algorithm for mining frequent closed itemsets over data streams both for the sliding window model and for the landmark model. An indexed prefix closed itemset tree is proposed for compressing all closed itemsets and for quick searching of closed itemsets, and novel search strategies are proposed to prune the search space in updating the set of closed itemsets. The proposed algorithm outperforms the state-of-the-art intersection-based algorithms, CICLAD, ConPatSet, and CloStream, by several times to 2 orders of magnitude in efficiency, and also outperforms the state-of-the-art pattern enumeration algorithm, Moment, by up to 2 orders of magnitude over data streams with large windows and sparse data streams. The proposed algorithm is also superior in scalability.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据