4.7 Article

EDISC: A Class-Tailored Discretization Technique for Rule-Based Classification

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2011.101

关键词

Discretization; continuous values; discrete values; data transformation; data mining; machine learning; inductive learning; supervised learning; rule induction

向作者/读者索取更多资源

Discretization is a critical component of data mining whereby continuous attributes of a data set are converted into discrete ones by creating intervals either before or during learning. There are many good reasons for preprocessing discretization, such as increased learning efficiency and classification accuracy, comprehensibility of data mining results, as well as the inherent limitation of a great majority of learning algorithms to handle only discrete data. Many preprocessing discretization techniques have been proposed to date, of which the Entropy-MDLP discretization has been accepted as by far the most effective in the context of both decision tree learning and rule induction algorithms. This paper presents a new discretization technique EDISC which utilizes the entropy-based principle but takes a class-tailored approach to discretization. The technique is applicable in general to any covering algorithm, including those that use the class-per-class rule induction methodology such as CN2 as well as those that use a seed example during the learning phase, such as the RULES family. Experimental evaluation has proved the efficiency and effectiveness of the technique as a preprocessing discretization procedure for CN2 as well as RULES-7, the latest algorithm among the RULES family of inductive learning algorithms.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据