4.6 Article

Text length considered adaptive bagging ensemble learning algorithm for text classification

Journal

MULTIMEDIA TOOLS AND APPLICATIONS
Volume 82, Issue 18, Pages 27681-27706

Publisher

SPRINGER
DOI: 10.1007/s11042-023-14578-9

Keywords

Ensemble learning; Base classifier; Text classification; Deep learning; Random sampling

Ask authors/readers for more resources

Ensemble learning is widely used in text classification field to construct strong classifiers. A text length considered adaptive Bagging ensemble learning algorithm (TC_Bagging) is proposed to improve text classification accuracy. It compares different deep learning methods in processing long and short texts, constructs optimal base classifier groups, and uses an adaptive threshold group based random sampling method to train text sample subsets of different lengths. The algorithm combines the smooth inverse frequency based text vector generation algorithm with the traditional weighted voting classifier ensemble method to achieve better classification performance than baseline methods.
Ensemble learning constructs strong classifiers by training multiple weak classifiers, and is widely used in text classification field. In order to improve the text classification accuracy, a text length considered adaptive bootstrap aggregating (Bagging) ensemble learning algorithm (called TC_Bagging) for text classification is proposed. Firstly, the performances of different typical deep learning methods in processing long and short texts are compared, and the optimal base classifier groups are constructed for long and short texts. Secondly, an adaptive threshold group based random sampling method is proposed to realize the training of long text and short text sample subsets while retaining the proportions of samples in different categories. Finally, in order to avoid the problem that the sampling process may decrease the accuracy, the smooth inverse frequency (SIF) based text vector generation algorithm is combined with the traditional weighted voting classifier ensemble method to obtain the final classification result. By comparing TC_Bagging with several other baseline methods on three datasets, our evaluation suggests that the results of TC_Bagging are approximately 0.120, 0.300 and 0.060 better than that of RF, WAVE, RF_WMVE and RF_WAVE in terms of average F-1, average sensitivity and average specificity measurements, respectively, showing that TC_Bagging has obvious advantage over typical ensemble learning algorithms.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available