4.7 Article

Leveraging the meta-embedding for text classification in a resource-constrained language

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.engappai.2023.106586

关键词

Natural language processing; Text classification; Text corpora; Semantic feature extraction; Meta-embedding; Deep learning

向作者/读者索取更多资源

This paper proposes an intelligent text classification framework called AVG-M+CNN for a resource-constrained language like Bengali. The framework includes an average meta-embedding feature fusion module and a convolutions neural network module. It also introduces an automatic hyperparameter tuning and selection algorithm to enhance the performance. The proposed models are evaluated using intrinsic and extrinsic evaluators, and the AVG-M+CNN model achieves high accuracy rates on multiple Bengali corpora.
This paper proposes an intelligent text classification framework for a resource-constrained language like Bengali, which is considered a challenging task due to the lack of standard corpora, appropriate hyper-parameter tuning method, and pre-trained language-specific embedding. The proposed framework comprises an average meta-embedding feature fusion module and a convolutions neural network module called AVG-M+CNN. This work also proposes an algorithm, i.e., automatic hyperparameter tuning and selection, for enhancing the performance of the AVG-M+CN N technique. A l l meta-embedding models are evaluated using the intrinsic, e.g., semantic, syntactic, relatedness word similarity, analog y tasks and extrinsic evaluators. The intrinsic evaluator evaluates 200 Bengali semantic, syntactic and relatedness word pairs. Spearman (o), Pearson (?) and cosine similarity correlations are used to evaluate 18 individual embedding and 9 meta-embedding models. The 3COSADD and 3COSMU L evaluators evaluate the 300 analog y tasks. The extrinsic evaluator evaluates a total of 156 classification models on four corpora: BARD, IndicNLP, Prothom-Alo and BTCC 11 (a newly developed corpus having eleven distinct categories). Among these, the AVG-M+CN N model achieves the highest accuracy regarding four Bengal i corpora: 95.92 & PLUSMN;.001% for BARD, 93.10 & PLUSMN;.001% for Prothom-Alo, 90.07 & PLUSMN;.001% for BTCC 11 and 87.44 & PLUSMN;.001% for IndicNLP, respectively.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据