4.8 Article

Feature selection using an improved Chi-square for Arabic text classification

Publisher

ELSEVIER
DOI: 10.1016/j.jksuci.2018.05.010

Keywords

Feature selection; Chi-square; Arabic text classification; Light stemming; Mutual information; Information gain; SVM; Decision tree

Ask authors/readers for more resources

eIn text mining, feature selection (FS) is a common method for reducing the huge number of the space features and improving the accuracy of classification. In this paper, we propose an improved method for Arabic text classification that employs the Chi-square feature selection (referred to, hereafter, as ImpCHI) to enhance the classification performance. Besides, we have also compared this improved chi-square with three traditional features selection metrics namely mutual information, information gain and Chi-square. Building on our previous work, we extend the current work to assess the method in terms of other evaluation methods using SVM classifier. For this purpose, a dataset of 5070 Arabic documents are classified into six independently classes. In terms of performance, the experimental findings show that combining ImpCHI method and SVM classifier outperforms other combinations in terms of precision, recall and f-measures. This combination significantly improves the performance of Arabic text classification model. The best f-measures obtained for this model is 90.50%, when the number of features is 900. (C) 2018 The Authors. Production and hosting by Elsevier B.V.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available