☆ 4.7 Article

Sampling scheme-based classification rule mining method using decision tree in big data environment

KNOWLEDGE-BASED SYSTEMS (2022)

期刊

KNOWLEDGE-BASED SYSTEMS

卷 244, 期 -, 页码 -

出版社

ELSEVIER

DOI: 10.1016/j.knosys.2022.108522

关键词

Classification rules; Decision tree; Sampling; Reliability; Big data

类别

Computer Science, Artificial Intelligence

资金

National Natural Science Foundation of China [72101082, 71771078, 71371064]
Natural Science Foundation of Hebei Province, China [F20212 08011, G2020208002]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Obtaining comprehensible classification rules is crucial in real applications, and decision-tree methods are commonly used. However, their performance is unsatisfactory and lacks theoretical support in big data scenarios. This study introduces a sampling-based classification rule mining (SCRM) method to improve the adaptability and generalization ability of classification rules in big data environments. The SCRM was evaluated using seven UCI datasets and showed good classification ability.

Obtaining comprehensible classification rules may be extremely important in many real applications such as data-driven decision-making and classification tasks. Decision-tree methods are powerful and popular tools for acquiring classification rules. However, they do not show good performance, and the base data processing methods lack strong theoretical support in big data scenarios. This study introduces a sampling scheme with and without the replacement of the implementations of decision tree methods. This method, called sampling-based classification rule mining (SCRM), is designed to improve the adaptation and generalization ability of classification rules in a big-data environment. Sampling without replacement is conducted to refine classification rules using the concept of conflict and coverage rules, while sampling with replacement is applied to determine rule reliability; the reliability approximation property of classification rules is proved by using the law of large numbers. The effectiveness of the SCRM was evaluated and verified using seven UCI datasets. Theoretical analysis and experimental results show that SCRM is generic with good classification ability, thereby improving the classification accuracy of the rules. SCRM has a significant advantage as it provides theoretical and methodological support for the classification rule mining of big data. Therefore, the SCRM can be used in many applications. (c) 2022 Elsevier B.V. All rights reserved.

Sampling scheme-based classification rule mining method using decision tree in big data environment

期刊

KNOWLEDGE-BASED SYSTEMS

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Sampling scheme-based classification rule mining method using decision tree in big data environment

期刊

KNOWLEDGE-BASED SYSTEMS

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文