☆ 4.8 Article

Contextual Embeddings based on Fine-tuned Urdu-BERT for Urdu threatening content and target identification

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES (2023)

Journal

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES

Volume 35, Issue 7, Pages -

Publisher

ELSEVIER

DOI: 10.1016/j.jksuci.2023.101606

Keywords

Threatening content; Target identification; Fine-tuned BERT; Urdu; Twitter; Text representation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper proposes a hierarchical classification model for identifying threatening content and target in Urdu tweets. By utilizing the Urdu-BERT language model and transfer learning, the fine-tuned model achieves state-of-the-art performance in threatening content identification and target identification tasks.

Identification of threatening text on social media platforms is a challenging task. Contrary to the highresource languages, the Urdu language has very limited such approaches and the benchmark approach has an issue of inappropriate data annotation. Therefore, we present robust threatening content and target identification as a hierarchical classification model for Urdu tweets. This study investigates the potential of the Urdu-BERT (Bidirectional Encoder Representations from Transformer) language model to learn universal contextualized representations aiming to showcase its usefulness for binary classification tasks of threatening content and target identification. We propose to exploit a pre-trained Urdu-BERT as a transfer learning model after fine-tuning its parameters on a newly designed Urdu corpus from the Twitter platform. The proposed dataset contains 2,400 tweets manually annotated as threatening or non-threatening at the first level and threatening tweets are further categorized into individual or group at the second level. The performance of fine-tuned Urdu-BERT is compared with the benchmark study and other feature models. Experimental results show that the fine-tuned Urdu-BERT model achieves state-of-the-art performance by obtaining 87.5% accuracy and 87.8% F1-score for threatening content identification and 82.5% accuracy and 83.2% F1-score for target identification task. Furthermore, the proposed model outperforms the benchmark study. (c) 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Contextual Embeddings based on Fine-tuned Urdu-BERT for Urdu threatening content and target identification

Journal

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Contextual Embeddings based on Fine-tuned Urdu-BERT for Urdu threatening content and target identification

Journal

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper