4.6 Article

Ensemble feature selection for single-label text classification: a comprehensive analytical study

Journal

NEURAL COMPUTING & APPLICATIONS
Volume 35, Issue 26, Pages 19235-19251

Publisher

SPRINGER LONDON LTD
DOI: 10.1007/s00521-023-08763-y

Keywords

Text classification; Feature selection; Global; Local; Ensemble feature subsets

Ask authors/readers for more resources

Text classification is a crucial problem in the modern era due to the large amount of textual data. Feature selection, which has a big impact on classification accuracy, is one of the most crucial processes in text classification studies. Various feature selection techniques are suggested in the literature, each with a different feature order and selection criteria. This study aims to combine these distinguishing features in different orders to observe the success and failure of different methods when combined. The results show that the combination of feature selection approaches performs better than any single feature selection method alone, but some combinations may have lower performance rates than individual methods.
Due to the large amount of textual data, text classification is a crucial problem in the modern era. In text classification studies, feature selection is one of the most crucial processes because it has a big impact on classification accuracy. Many feature selection techniques are suggested in the field of text classification in the literature. Each method sorts the features by assigning a score according to its algorithm. Then, the classification process is performed by selecting top-N features. However, the feature order for each method is different from each other. Each method selects by assigning a high score to the features that are important according to its algorithm, while it does not select by assigning a low score to the insignificant features. However, each method selects different distinguishing features according to its algorithm. With combinations of these distinguishing features, a higher performance classification process can be achieved. So, the classification process is to combine the features in a different order according to each method in this study. Thus, it will be observed which methods are successful or unsuccessful when combined. In addition, it was observed that the methods chose how many different features from each other. Accordingly, the classification is made by combining the features of different sizes and combining two local and two global feature selection methods. Numerous studies using three benchmark datasets have shown that the combination of feature selection approaches performs better than any single feature selection method used alone. However, some combinations have lower performance rates than individual methods. Thus, a comprehensive study was carried out in text classification domain.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available