4.6 Article

An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian

期刊

SENSORS
卷 21, 期 1, 页码 -

出版社

MDPI
DOI: 10.3390/s21010133

关键词

sentiment analysis; NLP; language models; BERT; Italian language

资金

  1. Italian project IDEHA-Innovation for Data Elaboration in Heritage Areas - PON Ricerca e Innovazione

向作者/读者索取更多资源

This translation work presents a new method for sentiment analysis on Twitter, where tweet jargon is transformed into plain text and classified using the pre-trained language model BERT. Results show effectiveness in Italian and potential applicability to other languages.
Over the last decade industrial and academic communities have increased their focus on sentiment analysis techniques, especially applied to tweets. State-of-the-art results have been recently achieved using language models trained from scratch on corpora made up exclusively of tweets, in order to better handle the Twitter jargon. This work aims to introduce a different approach for Twitter sentiment analysis based on two steps. Firstly, the tweet jargon, including emojis and emoticons, is transformed into plain text, exploiting procedures that are language-independent or easily applicable to different languages. Secondly, the resulting tweets are classified using the language model BERT, but pre-trained on plain text, instead of tweets, for two reasons: (1) pre-trained models on plain text are easily available in many languages, avoiding resource- and time-consuming model training directly on tweets from scratch; (2) available plain text corpora are larger than tweet-only ones, therefore allowing better performance. A case study describing the application of the approach to Italian is presented, with a comparison with other Italian existing solutions. The results obtained show the effectiveness of the approach and indicate that, thanks to its general basis from a methodological perspective, it can also be promising for other languages.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据