4.7 Article

Fuzzy high-utility pattern mining in parallel and distributed Hadoop framework

期刊

INFORMATION SCIENCES
卷 553, 期 -, 页码 31-48

出版社

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2020.12.004

关键词

Hadoop; High fuzzy utility pattern; High utility itemset mining; Big-data; Fuzzy-set theory; MapReduce

向作者/读者索取更多资源

High-utility itemset mining (HUIM) has gained attention for emphasizing important information compared to frequent itemset mining (FIM), but it is similar to FIM in methodology. Previous studies focused on small datasets, which is not realistic for today’s big data environments. This research utilized fuzzy-set theory and a MapReduce framework to design a novel high fuzzy utility pattern mining algorithm, showing strong performance in mining high fuzzy utility patterns.
Over the past decade, high-utility itemset mining (HUIM) has received widespread attention that can emphasize more critical information than was previously possible using frequent itemset mining (FIM). Unfortunately, HUIM is very similar to FIM since the methodology determines itemsets using a binary model based on a pre-defined minimum utility threshold. Additionally, most previous works only focused on single, small datasets in HUIM, which is not realistic to any real-world scenarios today containing big data environments. In this work, the fuzzy-set theory and a MapReduce framework are both utilized to design a novel high fuzzy utility pattern mining algorithm to resolve the above issues. Fuzzy-set theory is first involved and a new algorithm called efficient high fuzzy utility itemset mining (EFUPM) is designed to discover high fuzzy utility patterns from a single machine. Two upper-bounds are then estimated to allow early pruning of unpromising candidates in the search space. To handle the large-scale of big datasets, a Hadoop-based high fuzzy utility pattern mining (HFUPM) algorithm is then developed to discover high fuzzy utility patterns based on the Hadoop framework. Experimental results clearly show that the proposed algorithms perform strongly to mine the required high fuzzy utility patterns whether in a single machine or a large-scale environment compared to the current state-of-the-art approaches. (C) 2020 The Author(s). Published by Elsevier Inc.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据