☆ 4.7 Article

Relative discrimination criterion - A novel feature ranking method for text data

EXPERT SYSTEMS WITH APPLICATIONS (2015)

Journal

EXPERT SYSTEMS WITH APPLICATIONS

Volume 42, Issue 7, Pages 3670-3681

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.eswa.2014.12.013

Keywords

Text classification; Feature selection; Document frequency; Term count; True positive rate; False positive rate

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

High dimensionality of text data hinders the performance of classifiers making it necessary to apply feature selection for dimensionality reduction. Most of the feature ranking metrics for text classification are based on document frequencies (df) of a term in positive and negative classes. Considering only document frequencies to rank features favors terms frequently occurring in larger classes in unbalanced datasets. In this paper we introduce a new feature ranking metric termed as relative discrimination criterion (RDC), which takes document frequencies for each term count of a term into account while estimating the usefulness of a term. The performance of RDC is compared with four well known feature ranking metrics, information gain (IG), CHI squared (CHI), odds ratio (OR) and distinguishing feature selector (DFS) using support vector machines (SVM) and multinomial naive Bayes (MNB) classifiers on four benchmark datasets, namely Reuters, 20 Newsgroups and two subsets of Ohsumed dataset. Our results based on macro and micro F1 measures show that the performance of RDC is superior than the other four metrics in 65% of our experimental trials. Also, RDC attains highest macro and micro F1 values in 69% of the cases. (C) 2014 Elsevier Ltd. All rights reserved.

Relative discrimination criterion - A novel feature ranking method for text data

Journal

EXPERT SYSTEMS WITH APPLICATIONS

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Relative discrimination criterion - A novel feature ranking method for text data

Journal

EXPERT SYSTEMS WITH APPLICATIONS

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper