4.7 Article

Comparing pre-trained language models for Spanish hate speech detection

期刊

EXPERT SYSTEMS WITH APPLICATIONS
卷 166, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2020.114120

关键词

Hate speech; Transfer learning; BERT; BETO; Natural language processing; Text classification

资金

  1. European Regional Development Fund (ERDF), LIVING-LANG project [RTI2018-094653-B-C21]
  2. Ministry of Science, Innovation and Universities from the Spanish Government [FPI-PRE2019-089310]

向作者/读者索取更多资源

The paper discusses the task of Spanish hate speech identification on social media and the capabilities of new techniques based on machine learning. The study compares the performance of different methods, with the main contribution being the achievement of promising results in Spanish through the application of multilingual and monolingual pre-trained language models.
Nowadays, due to the great uncontrolled content posted daily on the Web, there has also been a huge increase in the dissemination of hate speech worldwide. Social media, blogs and community forums are examples where people are freely allowed to communicate. However, freedom of expression is not always respectful since offensive or insulting language is sometimes used. Social media companies often rely on users and content moderators to report on this type of content. Nevertheless, due to the large amount of content generated every day on the Web, automatic systems based on Natural Language Processing techniques are required for identifying abusive language online. To date, most of the systems developed to combat this problem are mainly focused on English content, but this issue is a worldwide concern and therefore other languages such as Spanish are involved. In this paper, we address the task of Spanish hate speech identification on social media and provide a deeper understanding of the capabilities of new techniques based on machine learning. In particular, we compare the performance of Deep Learning methods with recently pre-trained language models based on Transfer Learning as well as with traditional machine learning models. Our main contribution is the achievement of promising results in Spanish by applying multilingual and monolingual pre-trained language models such as BERT, XLM and BETO.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据