☆ 4.7 Article

Comparing pre-trained language models for Spanish hate speech detection

EXPERT SYSTEMS WITH APPLICATIONS (2021)

Journal

EXPERT SYSTEMS WITH APPLICATIONS

Volume 166, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.eswa.2020.114120

Keywords

Hate speech; Transfer learning; BERT; BETO; Natural language processing; Text classification

Funding

European Regional Development Fund (ERDF), LIVING-LANG project [RTI2018-094653-B-C21]
Ministry of Science, Innovation and Universities from the Spanish Government [FPI-PRE2019-089310]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The paper discusses the task of Spanish hate speech identification on social media and the capabilities of new techniques based on machine learning. The study compares the performance of different methods, with the main contribution being the achievement of promising results in Spanish through the application of multilingual and monolingual pre-trained language models.

Nowadays, due to the great uncontrolled content posted daily on the Web, there has also been a huge increase in the dissemination of hate speech worldwide. Social media, blogs and community forums are examples where people are freely allowed to communicate. However, freedom of expression is not always respectful since offensive or insulting language is sometimes used. Social media companies often rely on users and content moderators to report on this type of content. Nevertheless, due to the large amount of content generated every day on the Web, automatic systems based on Natural Language Processing techniques are required for identifying abusive language online. To date, most of the systems developed to combat this problem are mainly focused on English content, but this issue is a worldwide concern and therefore other languages such as Spanish are involved. In this paper, we address the task of Spanish hate speech identification on social media and provide a deeper understanding of the capabilities of new techniques based on machine learning. In particular, we compare the performance of Deep Learning methods with recently pre-trained language models based on Transfer Learning as well as with traditional machine learning models. Our main contribution is the achievement of promising results in Spanish by applying multilingual and monolingual pre-trained language models such as BERT, XLM and BETO.

Comparing pre-trained language models for Spanish hate speech detection

Journal

EXPERT SYSTEMS WITH APPLICATIONS

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Comparing pre-trained language models for Spanish hate speech detection

Journal

EXPERT SYSTEMS WITH APPLICATIONS

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper