4.7 Article

Comparing pre-trained language models for Spanish hate speech detection

Journal

EXPERT SYSTEMS WITH APPLICATIONS
Volume 166, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2020.114120

Keywords

Hate speech; Transfer learning; BERT; BETO; Natural language processing; Text classification

Funding

  1. European Regional Development Fund (ERDF), LIVING-LANG project [RTI2018-094653-B-C21]
  2. Ministry of Science, Innovation and Universities from the Spanish Government [FPI-PRE2019-089310]

Ask authors/readers for more resources

The paper discusses the task of Spanish hate speech identification on social media and the capabilities of new techniques based on machine learning. The study compares the performance of different methods, with the main contribution being the achievement of promising results in Spanish through the application of multilingual and monolingual pre-trained language models.
Nowadays, due to the great uncontrolled content posted daily on the Web, there has also been a huge increase in the dissemination of hate speech worldwide. Social media, blogs and community forums are examples where people are freely allowed to communicate. However, freedom of expression is not always respectful since offensive or insulting language is sometimes used. Social media companies often rely on users and content moderators to report on this type of content. Nevertheless, due to the large amount of content generated every day on the Web, automatic systems based on Natural Language Processing techniques are required for identifying abusive language online. To date, most of the systems developed to combat this problem are mainly focused on English content, but this issue is a worldwide concern and therefore other languages such as Spanish are involved. In this paper, we address the task of Spanish hate speech identification on social media and provide a deeper understanding of the capabilities of new techniques based on machine learning. In particular, we compare the performance of Deep Learning methods with recently pre-trained language models based on Transfer Learning as well as with traditional machine learning models. Our main contribution is the achievement of promising results in Spanish by applying multilingual and monolingual pre-trained language models such as BERT, XLM and BETO.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available