☆ 3.8 Proceedings Paper

M3: MultiModal Masking applied to sentiment analysis

INTERSPEECH 2021 (2021)

期刊

INTERSPEECH 2021

卷 -, 期 -, 页码 2876-2880

出版社

ISCA-INT SPEECH COMMUNICATION ASSOC

DOI: 10.21437/Interspeech.2021-1739

关键词

multimodal; masking; sentiment analysis; dropout; CMU-MOSEI

类别

Audiology & Speech-Language Pathology Computer Science, Artificial Intelligence Computer Science, Software Engineering

资金

European Regional Development Fund of the European Union
Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH -CREATE -INNOVATE (project safety4all) [T1EDK04248]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this work, a training procedure named M-3 is proposed for deep multimodal architectures, which involves masking features of one modality during network training to force the model to make predictions in the absence of that modality. The structured regularization of M-3 allows the network to better leverage complementary information from input modalities. Experimental results show that M-3 outperforms other masking schemes and achieves performance improvements for multimodal sentiment analysis tasks.

A common issue when training multimodal architectures is that not all modalities contribute equally to the model's prediction and the network tends to over-rely on the strongest modality. In this work, we present M-3, a training procedure based on modality masking for deep multimodal architectures. During network training, we randomly select onemodality and mask its features, forcing the model to make its prediction in the absence of this modality. This structured regularization allows the network to better exploit complementary information in input modalities. We implement M-3 as a generic layer that can be integrated with any multimodal architecture. Our experiments show that M-3 outperforms other masking schemes and improves performance for our strong baseline. We evaluate M-3 for multimodal sentiment analysis on CMU-MOSEI, achieving results comparable to the state-of-the-art.

M3: MultiModal Masking applied to sentiment analysis

期刊

INTERSPEECH 2021

出版社

ISCA-INT SPEECH COMMUNICATION ASSOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

M3: MultiModal Masking applied to sentiment analysis

期刊

INTERSPEECH 2021

出版社

ISCA-INT SPEECH COMMUNICATION ASSOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文