☆ 4.7 Article

Feature selection via maximizing global information gain for text classification

KNOWLEDGE-BASED SYSTEMS (2013)

期刊

KNOWLEDGE-BASED SYSTEMS

卷 54, 期 -, 页码 298-309

出版社

ELSEVIER

DOI: 10.1016/j.knosys.2013.09.019

关键词

Feature selection; Text classification; High dimensionality; Distributional clustering; Information bottleneck

类别

Computer Science, Artificial Intelligence

资金

National Science Foundation of China [61303167, 61363047]
Science and Technology Support Foundation of Jiangxi Province [20111BBE50008]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Feature selection is a vital preprocessing step for text classification task used to solve the curse of dimensionality problem. Most existing metrics (such as information gain) only evaluate features individually but completely ignore the redundancy between them. This can decrease the overall discriminative power because one feature's predictive power is weakened by others. On the other hand, though all higher order algorithms (such as mRMR) take redundancy into account, the high computational complexity renders them improper in the text domain. This paper proposes a novel metric called global information gain (GIG) which can avoid redundancy naturally. An efficient feature selection method called maximizing global information gain (MGIG) is also given. We compare MGIG with four other algorithms on six datasets, the experimental results show that MGIG has better results than others methods in most cases. Moreover, MGIG runs significantly faster than the traditional higher order algorithms, which makes it a proper choice for feature selection in text domain. (C) 2013 Elsevier B.V. All rights reserved.

Feature selection via maximizing global information gain for text classification

期刊

KNOWLEDGE-BASED SYSTEMS

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Feature selection via maximizing global information gain for text classification

期刊

KNOWLEDGE-BASED SYSTEMS

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文