4.4 Article

Attention-based multi-modal fusion sarcasm detection

期刊

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS
卷 44, 期 2, 页码 2097-2108

出版社

IOS PRESS
DOI: 10.3233/JIFS-213501

关键词

Multi-modal; sarcasm detection; Attention; ViT; D-BiGRU

向作者/读者索取更多资源

This paper proposes a multi-modal fusion sarcasm detection model based on the attention mechanism, introducing Vision Transformer (ViT) to extract image features and designing a Double-Layer Bi-Directional Gated Recurrent Unit (D-BiGRU) to extract text features. The features of the two modalities are fused into one feature vector and predicted after attention enhancement. The model presented in this paper achieved significant experimental results on the baseline datasets, with F1-score and accuracy higher by 0.71% and 0.38% respectively compared to the best baseline model.
Sarcasm is a way to express the thoughts of a person. The intended meaning of the ideas expressed through sarcasm is often the opposite of the apparent meaning. Previous work on sarcasm detection mainly focused on the text. But nowadays most information is multi-modal, including text and images. Therefore, the task of targeting multi-modal sarcasm detection is becoming an increasingly hot research topic. In order to better detect the accurate meaning of multi-modal sarcasm information, this paper proposed a multi-modal fusion sarcasm detection model based on the attention mechanism, which introduced Vision Transformer (ViT) to extract image features and designed a Double-Layer Bi-Directional Gated Recurrent Unit (D-BiGRU) to extract text features. The features of the two modalities are fused into one feature vector and predicted after attention enhancement. The model presented in this paper gained significant experimental results on the baseline datasets, which are 0.71% and 0.38% higher than that of the best baseline model proposed on F1-score and accuracy respectively.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据