4.8 Article

Contextual Embeddings based on Fine-tuned Urdu-BERT for Urdu threatening content and target identification

Publisher

ELSEVIER
DOI: 10.1016/j.jksuci.2023.101606

Keywords

Threatening content; Target identification; Fine-tuned BERT; Urdu; Twitter; Text representation

Ask authors/readers for more resources

This paper proposes a hierarchical classification model for identifying threatening content and target in Urdu tweets. By utilizing the Urdu-BERT language model and transfer learning, the fine-tuned model achieves state-of-the-art performance in threatening content identification and target identification tasks.
Identification of threatening text on social media platforms is a challenging task. Contrary to the highresource languages, the Urdu language has very limited such approaches and the benchmark approach has an issue of inappropriate data annotation. Therefore, we present robust threatening content and target identification as a hierarchical classification model for Urdu tweets. This study investigates the potential of the Urdu-BERT (Bidirectional Encoder Representations from Transformer) language model to learn universal contextualized representations aiming to showcase its usefulness for binary classification tasks of threatening content and target identification. We propose to exploit a pre-trained Urdu-BERT as a transfer learning model after fine-tuning its parameters on a newly designed Urdu corpus from the Twitter platform. The proposed dataset contains 2,400 tweets manually annotated as threatening or non-threatening at the first level and threatening tweets are further categorized into individual or group at the second level. The performance of fine-tuned Urdu-BERT is compared with the benchmark study and other feature models. Experimental results show that the fine-tuned Urdu-BERT model achieves state-of-the-art performance by obtaining 87.5% accuracy and 87.8% F1-score for threatening content identification and 82.5% accuracy and 83.2% F1-score for target identification task. Furthermore, the proposed model outperforms the benchmark study. (c) 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available