☆ 4.6 Article

Selective sampling for trees and forests

NEUROCOMPUTING (2019)

期刊

NEUROCOMPUTING

卷 358, 期 -, 页码 93-108

出版社

ELSEVIER

DOI: 10.1016/j.neucom.2019.04.071

关键词

Selective sampling; Decision trees; Random forests; Classification; Active learning

类别

Computer Science, Artificial Intelligence

资金

450 mm Consortium of the Israeli Ministry of Industry and Commerce
Applied Materials Company

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In this paper we describe selective sampling algorithms for decision trees and random forests and their contribution to the classification accuracy. In our selective sampling algorithms, the instance that yields the highest expected utility is chosen to be labeled by the expert. We show that it is possible to obtain the most valuable unlabeled instance to be labeled by the expert and added to the training dataset of the decision tree simply by depicting the influence of this new instance on the class probabilities of the leaves. All the unlabeled instances that fall into the same leaf will have the same class probabilities. As a result, we can compute the expected accuracy of the decision tree according to its leaves instead for each individual unlabeled instance. An extension for random forests is also presented. Moreover, we show that the selective sampling classifier has to belong to the same family as the classifier whose accuracy we wish to improve but need not be identical to it. For example, a random forest classifier can be used for the selective sampling process, and the results can be used to improve the classification accuracy of a decision tree. Likewise, a random forest classifier consisting of three trees can be used in the selective sampling algorithm to improve the classification accuracy of a random forest consisting of ten trees. Our experiments show that the proposed selective sampling algorithms achieve better accuracy than the standard random sampling, uncertainty sampling and the active belief decision tree learning approach (ABC4.5) for several real-world datasets. We also show that our selective sampling algorithms improve significantly the classification performance of several state-of-the-art classifiers such as the random rotation forest classifier for real-world large-scale datasets. (C) 2019 Published by Elsevier B.V.

Selective sampling for trees and forests

期刊

NEUROCOMPUTING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Selective sampling for trees and forests

期刊

NEUROCOMPUTING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文