4.8 Article

Fast Eclat Algorithms Based on Minwise Hashing for Large Scale Transactions

期刊

IEEE INTERNET OF THINGS JOURNAL
卷 6, 期 2, 页码 3948-3961

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JIOT.2018.2885851

关键词

Eclat; fast algorithm; frequent itemset; minwise hashing

资金

  1. National Key Research and Development Program of China [2016YFB0800900]

向作者/读者索取更多资源

The Eclat algorithm is one of the most widely used frequent itemset mining methods. In the normal Eclat algorithm and its variants, it is inefficient to calculate the intersection size of itemsets by sequentially comparing elements, especially for large scale transactions. In this paper, we propose the fast Eclat algorithms that can quickly calculate the intersection size of multiple itemsets by using minwise hashing and the estimators. Minwise hashing is used to calculate the Jaccard similarity coefficient by mapping the elements of the sets to those of smaller sets. Two estimators are used to estimate the intersection size of itemsets based on the Jaccard similarity coefficient. Due to the imperfect hash function, minwise hashing may obtain a biased Jaccard similarity, which results in error between the real value and the estimated value of the intersection size. Thus, we proposed the HashEclat which uses the maximum of vertical bar A vertical bar and vertical bar B vertical bar to represent the union size vertical bar A boolean OR B vertical bar, and proposed the Sim-Eclat which uses the minimum of vertical bar A vertical bar and vertical bar B vertical bar to represent the intersection size vertical bar A boolean AND B vertical bar. Furthermore, we use a boundary error E for better performance as follows: if E is large, the intersection size is determined by a traditional method, and the result is more accurate but takes longer to compute; otherwise, it will be the opposite. Both the theoretical analysis and experimental results show that the proposed algorithms can obtain almost all frequent itemsets with higher speed and less memory usage than other algorithms.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据