☆ 4.5 Article

EFIM: a fast and memory efficient algorithm for high-utility itemset mining

KNOWLEDGE AND INFORMATION SYSTEMS (2017)

期刊

KNOWLEDGE AND INFORMATION SYSTEMS

卷 51, 期 2, 页码 595-625

出版社

SPRINGER LONDON LTD

DOI: 10.1007/s10115-016-0986-0

关键词

Pattern mining; Itemset mining, High-utility mining; Fast Utility Counting, High-utility database merging and projection

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems

资金

NSERC Discovery grant from the Government of Canada
Harbin Institute of Technology (Shenzhen)

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In recent years, high-utility itemset mining has emerged as an important data mining task. However, it remains computationally expensive both in terms of runtime and memory consumption. It is thus an important challenge to design more efficient algorithms for this task. In this paper, we address this issue by proposing a novel algorithm named EFIM (EFficient high-utility Itemset Mining), which introduces several new ideas to more efficiently discover high-utility itemsets. EFIM relies on two new upper bounds named revised sub-tree utility and local utility to more effectively prune the search space. It also introduces a novel array-based utility counting technique named Fast Utility Counting to calculate these upper bounds in linear time and space. Moreover, to reduce the cost of database scans, EFIM proposes efficient database projection and transaction merging techniques named High-utility Database Projection and High-utility Transaction Merging (HTM), also performed in linear time. An extensive experimental study on various datasets shows that EFIM is in general two to three orders of magnitude faster than the state-of-art algorithms HUP, HUI-Miner, HUP-Miner, FHM and UP-Growth+ on dense datasets and performs quite well on sparse datasets. Moreover, a key advantage of EFIM is its low memory consumption.

EFIM: a fast and memory efficient algorithm for high-utility itemset mining

期刊

KNOWLEDGE AND INFORMATION SYSTEMS

出版社

SPRINGER LONDON LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

EFIM: a fast and memory efficient algorithm for high-utility itemset mining

期刊

KNOWLEDGE AND INFORMATION SYSTEMS

出版社

SPRINGER LONDON LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文