☆ 4.7 Article

Enhancement of a multi-dialectal sentiment analysis system by the detection of the implied sarcastic features

KNOWLEDGE-BASED SYSTEMS (2021)

期刊

KNOWLEDGE-BASED SYSTEMS

卷 227, 期 -, 页码 -

出版社

ELSEVIER

DOI: 10.1016/j.knosys.2021.107232

关键词

Sentiment analysis; Irony; Sarcasm; Offensive language; Deep learning; Classical machine learning

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This article discusses how to improve the accuracy of sentiment analysis system by utilizing sarcastic features, which is challenging due to the implicit nature of sarcasm and incongruity in context. By extracting features, building sentimental, offensive, and sarcastic lexicons, as well as collecting corpora, enhancements were made to the sentiment analysis system.

Sentiment analysis is an NLP task that gained the interest of many researchers in various languages and recently in the Arabic language. We have encountered several challenges when dealing with this task, including sarcasm detection. In this article, we aim to exploit sarcastic characteristics to improve the accuracy of the sentiment analysis system. Sarcasm is difficult to detect because it is implicit and characterized by the presence of positive words in a negative context. We have then extracted a variety of features to define context incongruity and the opposition between the objective and subjective sentences. Offensive language and hate speech correspond to expressions that hurt others. The detection of offensive language is based on identifying offensive terms that are strongly negative and helpful to detect negative expressions. Thus, we have manually and automatically constructed sentimental, offensive and sarcastic lexicons and collected others. In the same way, many corpora either ironic (sarcastic, offensive) or sentimental (positive, negative) were collected. As sarcasm is a major challenge for the sentiment analysis system, we have built a balanced system that contains positive and negative (sarcastic, offensive) tweets. Since the analyzed corpus is multidialectal, we have used a cross dialect lexicon that retains meaning when passing from one dialect to another. Besides the Arabic dialect common characteristics, the classification was enhanced by the detection of the specificities of some dialects that use negation clitics as well as negation words to negate a term. The experiments prove that the enhancement of a sentiment analysis system by sarcastic features improved the results by 8% to reach 84.17% of accuracy using a classical machine learning approach and 80.36% using a Deep learning approach. The classical machine learning approach is improved afterward based on the expansion of the BOW lexicon and the reduction of the characteristic vector to reach an accuracy of 89.24%. This method is multilingual because the built model can be language independent. Indeed, it is enough to have the corresponding resources to apply the system to other languages. (C) 2021 Elsevier B.V. All rights reserved.

Enhancement of a multi-dialectal sentiment analysis system by the detection of the implied sarcastic features

期刊

KNOWLEDGE-BASED SYSTEMS

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Enhancement of a multi-dialectal sentiment analysis system by the detection of the implied sarcastic features

期刊

KNOWLEDGE-BASED SYSTEMS

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文