4.6 Article

Selective sampling for trees and forests

期刊

NEUROCOMPUTING
卷 358, 期 -, 页码 93-108

出版社

ELSEVIER
DOI: 10.1016/j.neucom.2019.04.071

关键词

Selective sampling; Decision trees; Random forests; Classification; Active learning

资金

  1. 450 mm Consortium of the Israeli Ministry of Industry and Commerce
  2. Applied Materials Company

向作者/读者索取更多资源

In this paper we describe selective sampling algorithms for decision trees and random forests and their contribution to the classification accuracy. In our selective sampling algorithms, the instance that yields the highest expected utility is chosen to be labeled by the expert. We show that it is possible to obtain the most valuable unlabeled instance to be labeled by the expert and added to the training dataset of the decision tree simply by depicting the influence of this new instance on the class probabilities of the leaves. All the unlabeled instances that fall into the same leaf will have the same class probabilities. As a result, we can compute the expected accuracy of the decision tree according to its leaves instead for each individual unlabeled instance. An extension for random forests is also presented. Moreover, we show that the selective sampling classifier has to belong to the same family as the classifier whose accuracy we wish to improve but need not be identical to it. For example, a random forest classifier can be used for the selective sampling process, and the results can be used to improve the classification accuracy of a decision tree. Likewise, a random forest classifier consisting of three trees can be used in the selective sampling algorithm to improve the classification accuracy of a random forest consisting of ten trees. Our experiments show that the proposed selective sampling algorithms achieve better accuracy than the standard random sampling, uncertainty sampling and the active belief decision tree learning approach (ABC4.5) for several real-world datasets. We also show that our selective sampling algorithms improve significantly the classification performance of several state-of-the-art classifiers such as the random rotation forest classifier for real-world large-scale datasets. (C) 2019 Published by Elsevier B.V.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据