☆ 3.9 Article

Simultaneous Learning of Sentence Clustering and Class Prediction for Improved Document Classification

INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS (2017)

期刊

INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS

卷 17, 期 1, 页码 35-42

出版社

KOREAN INST INTELLIGENT SYSTEMS

DOI: 10.5391/IJFIS.2017.17.1.35

关键词

Machine learning; Document classification; Sequence labeling; Term weighting

类别

Computer Science, Theory & Methods

资金

Seoul National University of Science Technology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In document classification it is common to represent a document as the so called bag-of-words form, which is essentially a global term distribution indicating how often certain terms appear in a text. Ignoring the spatial statistics (i.e., where in a text they appear) can potentially lead to a suboptimal solution. The key motivation or assumption in this paper is that there may exist underlying segmentation of sentences in a document, and perhaps this partitioning might be intuitively appealing (e.g., each group corresponds to a particular sentiment or gist of arguments). If the segmentation is known somehow, terms belonging to the same/different groups can potentially be treated in an equal/different manner for classification. Based on the idea, we build a novel document classification model comprised of two parts: a sentence tagger that predicts the group labels of sentences, and a classifier that forms the input features as a weighted term frequency vector that is aggregated from all sentences but weighed differently cluster-wise according to the prediction in the first model. We suggest an efficient learning strategy for this model. For several benchmark document classification problems, we demonstrate that the proposed approach yields significantly improved classification performance over several existing algorithms.

Simultaneous Learning of Sentence Clustering and Class Prediction for Improved Document Classification

期刊

INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS

出版社

KOREAN INST INTELLIGENT SYSTEMS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Simultaneous Learning of Sentence Clustering and Class Prediction for Improved Document Classification

期刊

INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS

出版社

KOREAN INST INTELLIGENT SYSTEMS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文