4.7 Article

Sampling scheme-based classification rule mining method using decision tree in big data environment

期刊

KNOWLEDGE-BASED SYSTEMS
卷 244, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.knosys.2022.108522

关键词

Classification rules; Decision tree; Sampling; Reliability; Big data

资金

  1. National Natural Science Foundation of China [72101082, 71771078, 71371064]
  2. Natural Science Foundation of Hebei Province, China [F20212 08011, G2020208002]

向作者/读者索取更多资源

Obtaining comprehensible classification rules is crucial in real applications, and decision-tree methods are commonly used. However, their performance is unsatisfactory and lacks theoretical support in big data scenarios. This study introduces a sampling-based classification rule mining (SCRM) method to improve the adaptability and generalization ability of classification rules in big data environments. The SCRM was evaluated using seven UCI datasets and showed good classification ability.
Obtaining comprehensible classification rules may be extremely important in many real applications such as data-driven decision-making and classification tasks. Decision-tree methods are powerful and popular tools for acquiring classification rules. However, they do not show good performance, and the base data processing methods lack strong theoretical support in big data scenarios. This study introduces a sampling scheme with and without the replacement of the implementations of decision tree methods. This method, called sampling-based classification rule mining (SCRM), is designed to improve the adaptation and generalization ability of classification rules in a big-data environment. Sampling without replacement is conducted to refine classification rules using the concept of conflict and coverage rules, while sampling with replacement is applied to determine rule reliability; the reliability approximation property of classification rules is proved by using the law of large numbers. The effectiveness of the SCRM was evaluated and verified using seven UCI datasets. Theoretical analysis and experimental results show that SCRM is generic with good classification ability, thereby improving the classification accuracy of the rules. SCRM has a significant advantage as it provides theoretical and methodological support for the classification rule mining of big data. Therefore, the SCRM can be used in many applications. (c) 2022 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据