4.6 Article

A Distributed Method for Fast Mining Frequent Patterns From Big Data

期刊

IEEE ACCESS
卷 9, 期 -, 页码 135144-135159

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2021.3115514

关键词

Data mining; Distributed databases; Itemsets; Costs; Memory management; Big Data; Artificial intelligence; Data mining; parallel algorithms; distributed computing

资金

  1. Ministry of Science and Technology of Taiwan [109-2221-E-992 -072 -MY3]

向作者/读者索取更多资源

In recent years, knowledge discovery in databases has provided powerful capabilities for discovering meaningful information, leading to a focus on distributed data mining as an important research area. The proposed algorithms based on FP growth offer fast and scalable service in distributed computing environments, showing superior cost-effectiveness and performance.
In recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining have been extensively studied. In traditional mining algorithms, data are centralized and memory-resident. As a result of the large amount of data, bandwidth limitation, and energy limitations when applying these methods to distributed databases, especially in this era of big data, the performance is not effective enough. Hence, data mining on distributed environments has emerged as an important research area. To improve the performance, we propose a set of algorithms based on FP growth that discover FPs that are capable of providing fast and scalable service in distributed computing environments and a brief data structure to store items and counts to minimize the data for transmission on the network. To ensure completeness and execution capability, DistEclat and BigFIM were considered for the experiment comparison. Experiments show that the proposed method has superior cost-effectiveness for processing massive datasets and good capabilities under various experiment conditions. The proposed method on average required only 33% of the execution time and 45% of the transmission cost of DistEclat. Compared to BigFIM, The proposed method on average required 23.3% of the execution time and 14.2% of the transmission cost of BigFIM.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据