4.5 Article

A note on split selection bias in classification trees

期刊

COMPUTATIONAL STATISTICS & DATA ANALYSIS
卷 45, 期 3, 页码 457-466

出版社

ELSEVIER SCIENCE BV
DOI: 10.1016/S0167-9473(03)00064-1

关键词

Cramer V-2 statistic; Kolmogorov-Smirnov statistic; P-value; Pearson chi-square statistic

向作者/读者索取更多资源

A common approach to split selection in classification trees is to search through all possible splits generated by predictor variables. A splitting criterion is then used to evaluate those splits and the one with the largest criterion value is usually chosen to actually channel samples into corresponding subnodes. However, this greedy method is biased in variable selection when the numbers of the available split points for each variable are different. Such result may thus hamper the intuitively appealing nature of classification trees. The problem of the split selection bias for two-class tasks with numerical predictors is examined. The statistical explanation of its existence is given and a solution based on the P-values is provided, when the Pearson chi-square statistic is used as the splitting criterion. (C) 2003 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据