☆ 4.4 Article

BRACID: a comprehensive approach to learning rules from imbalanced data

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS (2012)

期刊

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS

卷 39, 期 2, 页码 335-373

出版社

SPRINGER

DOI: 10.1007/s10844-011-0193-0

关键词

Rule induction; Imbalanced data; Classifiers; Nearest neighbour paradigm; Nearest rules

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems

资金

Ministry of Science and Higher Education [N N519 441939]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In this paper we consider induction of rule-based classifiers from imbalanced data, where one class (a minority class) is under-represented in comparison to the remaining majority classes. The minority class is usually of primary interest. However, most rule-based classifiers are biased towards the majority classes and they have difficulties with correct recognition of the minority class. In this paper we discuss sources of these difficulties related to data characteristics or to an algorithm itself. Among the problems related to the data distribution we focus on the role of small disjuncts, overlapping of classes and presence of noisy examples. Then, we show that standard techniques for induction of rule-based classifiers, such as sequential covering, top-down induction of rules or classification strategies, were created with the assumption of balanced data distribution, and we explain why they are biased towards the majority classes. Some modifications of rule-based classifiers have been already introduced, but they usually concentrate on individual problems. Therefore, we propose a novel algorithm, BRACID, which more comprehensively addresses the issues associated with imbalanced data. Its main characteristics includes a hybrid representation of rules and single examples, bottom-up learning of rules and a local classification strategy using nearest rules. The usefulness of BRACID has been evaluated in experiments on several imbalanced datasets. The results show that BRACID significantly outperforms the well known rule-based classifiers C4.5rules, RIPPER, PART, CN2, MODLEM as well as other related classifiers as RISE or K-NN. Moreover, it is comparable or better than the studied approaches specialized for imbalanced data such as generalizations of rule algorithms or combinations of SMOTE + ENN preprocessing with PART. Finally, it improves the support of minority class rules, leading to better recognition of the minority class examples.

BRACID: a comprehensive approach to learning rules from imbalanced data

期刊

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

BRACID: a comprehensive approach to learning rules from imbalanced data

期刊

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文