☆ 4.7 Article

Leveraging the meta-embedding for text classification in a resource-constrained language

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE (2023)

期刊

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

卷 124, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.engappai.2023.106586

关键词

Natural language processing; Text classification; Text corpora; Semantic feature extraction; Meta-embedding; Deep learning

类别

Automation & Control Systems Computer Science, Artificial Intelligence Engineering, Multidisciplinary Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes an intelligent text classification framework called AVG-M+CNN for a resource-constrained language like Bengali. The framework includes an average meta-embedding feature fusion module and a convolutions neural network module. It also introduces an automatic hyperparameter tuning and selection algorithm to enhance the performance. The proposed models are evaluated using intrinsic and extrinsic evaluators, and the AVG-M+CNN model achieves high accuracy rates on multiple Bengali corpora.

This paper proposes an intelligent text classification framework for a resource-constrained language like Bengali, which is considered a challenging task due to the lack of standard corpora, appropriate hyper-parameter tuning method, and pre-trained language-specific embedding. The proposed framework comprises an average meta-embedding feature fusion module and a convolutions neural network module called AVG-M+CNN. This work also proposes an algorithm, i.e., automatic hyperparameter tuning and selection, for enhancing the performance of the AVG-M+CN N technique. A l l meta-embedding models are evaluated using the intrinsic, e.g., semantic, syntactic, relatedness word similarity, analog y tasks and extrinsic evaluators. The intrinsic evaluator evaluates 200 Bengali semantic, syntactic and relatedness word pairs. Spearman (o), Pearson (?) and cosine similarity correlations are used to evaluate 18 individual embedding and 9 meta-embedding models. The 3COSADD and 3COSMU L evaluators evaluate the 300 analog y tasks. The extrinsic evaluator evaluates a total of 156 classification models on four corpora: BARD, IndicNLP, Prothom-Alo and BTCC 11 (a newly developed corpus having eleven distinct categories). Among these, the AVG-M+CN N model achieves the highest accuracy regarding four Bengal i corpora: 95.92 & PLUSMN;.001% for BARD, 93.10 & PLUSMN;.001% for Prothom-Alo, 90.07 & PLUSMN;.001% for BTCC 11 and 87.44 & PLUSMN;.001% for IndicNLP, respectively.

Leveraging the meta-embedding for text classification in a resource-constrained language

期刊

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Leveraging the meta-embedding for text classification in a resource-constrained language

期刊

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文