Journal
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES
Volume 32, Issue 2, Pages 225-231Publisher
ELSEVIER
DOI: 10.1016/j.jksuci.2018.05.010
Keywords
Feature selection; Chi-square; Arabic text classification; Light stemming; Mutual information; Information gain; SVM; Decision tree
Categories
Ask authors/readers for more resources
eIn text mining, feature selection (FS) is a common method for reducing the huge number of the space features and improving the accuracy of classification. In this paper, we propose an improved method for Arabic text classification that employs the Chi-square feature selection (referred to, hereafter, as ImpCHI) to enhance the classification performance. Besides, we have also compared this improved chi-square with three traditional features selection metrics namely mutual information, information gain and Chi-square. Building on our previous work, we extend the current work to assess the method in terms of other evaluation methods using SVM classifier. For this purpose, a dataset of 5070 Arabic documents are classified into six independently classes. In terms of performance, the experimental findings show that combining ImpCHI method and SVM classifier outperforms other combinations in terms of precision, recall and f-measures. This combination significantly improves the performance of Arabic text classification model. The best f-measures obtained for this model is 90.50%, when the number of features is 900. (C) 2018 The Authors. Production and hosting by Elsevier B.V.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available