☆ 4.7 Article

Comparing pre-trained language models for Spanish hate speech detection

EXPERT SYSTEMS WITH APPLICATIONS (2021)

期刊

EXPERT SYSTEMS WITH APPLICATIONS

卷 166, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.eswa.2020.114120

关键词

Hate speech; Transfer learning; BERT; BETO; Natural language processing; Text classification

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic Operations Research & Management Science

资金

European Regional Development Fund (ERDF), LIVING-LANG project [RTI2018-094653-B-C21]
Ministry of Science, Innovation and Universities from the Spanish Government [FPI-PRE2019-089310]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The paper discusses the task of Spanish hate speech identification on social media and the capabilities of new techniques based on machine learning. The study compares the performance of different methods, with the main contribution being the achievement of promising results in Spanish through the application of multilingual and monolingual pre-trained language models.

Nowadays, due to the great uncontrolled content posted daily on the Web, there has also been a huge increase in the dissemination of hate speech worldwide. Social media, blogs and community forums are examples where people are freely allowed to communicate. However, freedom of expression is not always respectful since offensive or insulting language is sometimes used. Social media companies often rely on users and content moderators to report on this type of content. Nevertheless, due to the large amount of content generated every day on the Web, automatic systems based on Natural Language Processing techniques are required for identifying abusive language online. To date, most of the systems developed to combat this problem are mainly focused on English content, but this issue is a worldwide concern and therefore other languages such as Spanish are involved. In this paper, we address the task of Spanish hate speech identification on social media and provide a deeper understanding of the capabilities of new techniques based on machine learning. In particular, we compare the performance of Deep Learning methods with recently pre-trained language models based on Transfer Learning as well as with traditional machine learning models. Our main contribution is the achievement of promising results in Spanish by applying multilingual and monolingual pre-trained language models such as BERT, XLM and BETO.

Comparing pre-trained language models for Spanish hate speech detection

期刊

EXPERT SYSTEMS WITH APPLICATIONS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Comparing pre-trained language models for Spanish hate speech detection

期刊

EXPERT SYSTEMS WITH APPLICATIONS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文