4.7 Article

EDISC: A Class-Tailored Discretization Technique for Rule-Based Classification

Journal

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
Volume 24, Issue 8, Pages 1435-1447

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2011.101

Keywords

Discretization; continuous values; discrete values; data transformation; data mining; machine learning; inductive learning; supervised learning; rule induction

Ask authors/readers for more resources

Discretization is a critical component of data mining whereby continuous attributes of a data set are converted into discrete ones by creating intervals either before or during learning. There are many good reasons for preprocessing discretization, such as increased learning efficiency and classification accuracy, comprehensibility of data mining results, as well as the inherent limitation of a great majority of learning algorithms to handle only discrete data. Many preprocessing discretization techniques have been proposed to date, of which the Entropy-MDLP discretization has been accepted as by far the most effective in the context of both decision tree learning and rule induction algorithms. This paper presents a new discretization technique EDISC which utilizes the entropy-based principle but takes a class-tailored approach to discretization. The technique is applicable in general to any covering algorithm, including those that use the class-per-class rule induction methodology such as CN2 as well as those that use a seed example during the learning phase, such as the RULES family. Experimental evaluation has proved the efficiency and effectiveness of the technique as a preprocessing discretization procedure for CN2 as well as RULES-7, the latest algorithm among the RULES family of inductive learning algorithms.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available