4.6 Article

Multimodal emotion recognition from facial expression and speech based on feature fusion

期刊

MULTIMEDIA TOOLS AND APPLICATIONS
卷 82, 期 11, 页码 16359-16373

出版社

SPRINGER
DOI: 10.1007/s11042-022-14185-0

关键词

Multimodal emotion recognition; Attention mechanism; Deep learning; Feature fusion

向作者/读者索取更多资源

This paper introduces a multimodal emotion recognition method that utilizes an attention mechanism to fuse audio and video features and model time series, effectively improving recognition accuracy.
Multimodal emotion recognition is designed to use expression and speech information to identify individual behaviors. Feature fusion can enrich various modal information, which is an important method for multimodal emotion recognition. However, there are several modal information synchronizations and overfitting problems due to large feature dimensions. So, an attention mechanism is introduced to automate the network to pay attention to local effective information. It is used to perform audio and video feature fusion tasks and timing modeling tasks in the network. The main contributions are as follows: 1) the multi-head self-attention mechanism is used for feature fusion of audio and video data to avoid the influence of prior information on the fusion results, and 2) a bidirectional gated recurrent unit is used to model the time series of fusion features; furthermore, the autocorrelation coefficient in the time dimension is also calculated as attention for fusion. Experiment results show that the adopted attention mechanism can effectively improve the accuracy of multimodal emotion recognition.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据