☆ 3.8 Proceedings Paper

A Semantics Aware Random Forest for Text Classification

PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19) (2019)

期刊

PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19)

卷 -, 期 -, 页码 1061-1070

出版社

ASSOC COMPUTING MACHINERY

DOI: 10.1145/3357384.3357891

关键词

Random Forest; Ensemble Selection; Semantic Explanations

类别

Computer Science, Theory & Methods

资金

University of South Australia
D2DCRC, Cooperative Research Centres Programme

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

The Random Forest (RF) classifiers are suitable for dealing with the high dimensional noisy data in text classification. An RF model comprises a set of decision trees each of which is trained using random subsets of features. Given an instance, the prediction by the RF is obtained via majority voting of the predictions of all the trees in the forest. However, different test instances would have different values for the features used in the trees and the trees should contribute differently to the predictions. This diverse contribution of the trees is not considered in traditional RFs. Many approaches have been proposed to model the diverse contributions by selecting a subset of trees for each instance. This paper is among these approaches. It proposes a Semantics Aware Random Forest (SARF) classifier. SARF extracts the features used by trees to generate the predictions and selects a subset of the predictions for which the features are relevant to the predicted classes. We evaluated SARF's classification performance on 30 real-world text datasets and assessed its competitiveness with state-of-the-art ensemble selection methods. The results demonstrate the superior performance of the proposed approach in textual information retrieval and initiate a new direction of research to utilise interpretability of classifiers.

A Semantics Aware Random Forest for Text Classification

期刊

PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19)

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A Semantics Aware Random Forest for Text Classification

期刊

PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19)

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文