☆ 4.7 Article

FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS (2016)

期刊

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS

卷 46, 期 3, 页码 313-325

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TSMC.2015.2437327

关键词

Frequent itemsets; frequent items ultrametric tree (FIU-tree); Hadoop cluster; load balance; MapReduce

类别

Automation & Control Systems Computer Science, Cybernetics

资金

U.S. National Science Foundation [CCF-0845257, CNS-0917137, CNS-0757778, CCF-0742187, CNS-0831502, CNS-0855251, DUE-0837341, OCI-0753305, DUE-0830831]
National Natural Science Foundation of China [61272263]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Existing parallel mining algorithms for frequent itemsets lack a mechanism that enables automatic parallelization, load balancing, data distribution, and fault tolerance on large clusters. As a solution to this problem, we design a parallel frequent itemsets mining algorithm called FiDoop using the MapReduce programming model. To achieve compressed storage and avoid building conditional pattern bases, FiDoop incorporates the frequent items ultrametric tree, rather than conventional FP trees. In FiDoop, three MapReduce jobs are implemented to complete the mining task. In the crucial third MapReduce job, the mappers independently decompose itemsets, the reducers perform combination operations by constructing small ultrametric trees, and the actual mining of these trees separately. We implement FiDoop on our in-house Hadoop cluster. We show that FiDoop on the cluster is sensitive to data distribution and dimensions, because itemsets with different lengths have different decomposition and construction costs. To improve FiDoop's performance, we develop a workload balance metric to measure load balance across the cluster's computing nodes. We develop FiDoop-HD, an extension of FiDoop, to speed up the mining performance for high-dimensional data analysis. Extensive experiments using real-world celestial spectral data demonstrate that our proposed solution is efficient and scalable.

FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce

期刊

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce

期刊

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文