☆ 4.5 Article

Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning

JOURNAL OF SYSTEMS AND SOFTWARE (2010)

Journal

JOURNAL OF SYSTEMS AND SOFTWARE

Volume 83, Issue 7, Pages 1137-1147

Publisher

ELSEVIER SCIENCE INC

DOI: 10.1016/j.jss.2010.01.002

Keywords

Classification; Cost-sensitive learning; Over-fitting

Funding

Australian Research Council (ARC) [DP0985456]
Nature Science Foundation (NSF) of China [90718020, 10661003]
China 973 Program [2008CB317108]
MOE [07JJD720044]
Guangxi NSF
Guangxi Colleges' Innovation Group

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Cost-sensitive learning algorithms are typically designed for minimizing the total cost when multiple costs are taken into account. Like other learning algorithms, cost-sensitive learning algorithms must face a significant challenge, over-fitting, in an applied context of cost-sensitive learning. Specifically speaking, they can generate good results on training data but normally do not produce an optimal model when applied to unseen data in real world applications. It is called data over-fitting. This paper deals with the issue of data over-fitting by designing three simple and efficient strategies, feature selection, smoothing and threshold pruning, against the TCSDT (test cost-sensitive decision tree) method. The feature selection approach is used to pre-process the data set before applying the TCSDT algorithm. The smoothing and threshold pruning are used in a TCSDT algorithm before calculating the class probability estimate for each decision tree leaf. To evaluate our approaches, we conduct extensive experiments on the selected UCI data sets across different cost ratios, and on a real world data set, KDD-98 with real misclassification cost. The experimental results show that our algorithms outperform both the original TCSDT and other competing algorithms on reducing data over-fitting. (C) 2010 Elsevier Inc. All rights reserved.

Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning

Journal

JOURNAL OF SYSTEMS AND SOFTWARE

Publisher

ELSEVIER SCIENCE INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning

Journal

JOURNAL OF SYSTEMS AND SOFTWARE

Publisher

ELSEVIER SCIENCE INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper