☆ 4.6 Article

TEDT: Transformer-Based Encoding-Decoding Translation Network for Multimodal Sentiment Analysis

COGNITIVE COMPUTATION (2023)

期刊

COGNITIVE COMPUTATION

卷 15, 期 1, 页码 289-303

出版社

SPRINGER

DOI: 10.1007/s12559-022-10073-9

关键词

Multimodal sentiment analysis; Transformer; Multimodal fusion; Multimodal attention

类别

Computer Science, Artificial Intelligence Neurosciences

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this study, a multimodal encoding-decoding translation network with a Transformer was proposed to address the impact of individual modal data on sentiment analysis results. The model achieved improved accuracy by using a joint encoding-decoding method, a modality reinforcement cross-attention module, and a dynamic filtering mechanism.

Multimodal sentiment analysis is a popular and challenging research topic in natural language processing, but the impact of individual modal data in videos on sentiment analysis results can be different. In the temporal dimension, natural language sentiment is influenced by nonnatural language sentiment, which may enhance or weaken the original sentiment of the current natural language. In addition, there is a general problem of poor quality of nonnatural language features, which essentially hinders the effect of multimodal fusion. To address the above issues, we proposed a multimodal encoding-decoding translation network with a transformer and adopted a joint encoding-decoding method with text as the primary information and sound and image as the secondary information. To reduce the negative impact of nonnatural language data on natural language data, we propose a modality reinforcement cross-attention module to convert nonnatural language features into natural language features to improve their quality and better integrate multimodal features. Moreover, the dynamic filtering mechanism filters out the error information generated in the cross-modal interaction to further improve the final output. We evaluated the proposed method on two multimodal sentiment analysis benchmark datasets (MOSI and MOSEI), and the accuracy of the method was 89.3% and 85.9%, respectively. In addition, our method outperformed the current state-of-the-art methods. Our model can greatly improve the effect of multimodal fusion and more accurately analyze human sentiment.

TEDT: Transformer-Based Encoding-Decoding Translation Network for Multimodal Sentiment Analysis

期刊

COGNITIVE COMPUTATION

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

TEDT: Transformer-Based Encoding-Decoding Translation Network for Multimodal Sentiment Analysis

期刊

COGNITIVE COMPUTATION

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文