期刊
PATTERN RECOGNITION AND COMPUTER VISION,, PT III
卷 13021, 期 -, 页码 3-15出版社
SPRINGER INTERNATIONAL PUBLISHING AG
DOI: 10.1007/978-3-030-88010-1_1
关键词
Target-oriented multimodal sentiment classification; BERT architecture; Recurrent attention
This paper proposes a recurrent attention network (SaliencyBERT) for target-oriented multimodal sentiment classification. The model combines textual and visual information, utilizes a recurrent attention mechanism to capture target-sensitive visual representations, and achieves good performance results.
As multimodal data become increasingly popular on social media platforms, it is desirable to enhance text-based approaches with other important data sources (e.g. images) for the Sentiment Classification of social media posts. However, existing approaches primarily rely on the textual content or are designed for the coarse-grained Multimodal Sentiment Classification. In this paper, we propose a recurrent attention network (called SaliencyBERT) over the BERT architecture for Target-oriented Multimodal Sentiment Classification (TMSC). Specifically, we first adopt BERT and ResNet to capture the intra-modality dynamics with the textual content and the visual information respectively. Then, we design a recurrent attention mechanism, which can derive target-sensitive visual representations, to capture the inter-modality dynamics. With recurrent attention, our model can progressively optimize the alignment of target-sensitive textual features and visual features and produce an output after a fixed number of time steps. Finally, we combine the loss of all-time steps for deep supervision to prevent converging slower and overfitting. Our empirical results show that the proposed model consistently outperforms single modal methods and achieves an indistinguishable or even better performance on several highly competitive methods on two multimodal datasets from Twitter.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据