4.7 Article

Feature selection based on feature interactions with application to text categorization

Journal

EXPERT SYSTEMS WITH APPLICATIONS
Volume 120, Issue -, Pages 207-216

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2018.11.018

Keywords

Feature selection; Feature interaction; Mutual information; Joint mutual information; Text categorization

Funding

  1. National Natural Science Foundation of China [61602094]
  2. Innovational Team Project of Sichuan Province [2015TD0002]

Ask authors/readers for more resources

Feature selection is an import preprocessing approach for machine learning and text mining. It reduces the dimensions of high-dimensional data. A popular approach is based on information theoretic measures. Most of the existing methods used two- and three-dimensional mutual information terms that are ineffective in detecting higher-order feature interactions. To fill this gap, we employ two- through five-way interactions for feature selection. We first identify a relaxed assumption to decompose the mutual information-based feature selection problem into a sum of low-order interactions. A direct calculation of the decomposed interaction terms is computationally expensive. We employ five-dimensional joint mutual information, a computationally efficient measure, to estimate the interaction terms. We use the 'maximum of the minimum' nonlinear approach to avoid the overestimation of the feature significance. We also apply the proposed method to text categorization. To evaluate the performance of the proposed method, we compare it with eleven popular feature selection methods, eighteen benchmark data and seven text categorization data. Experimental results with four different types of classifiers provide concrete evidence that higher-order interactions are effective in improving feature selection methods. (C) 2018 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available