☆ 4.5 Article

End-to-End Learning for Multimodal Emotion Recognition in Video With Adaptive Loss

IEEE MULTIMEDIA (2021)

期刊

IEEE MULTIMEDIA

卷 28, 期 2, 页码 59-66

出版社

IEEE COMPUTER SOC

DOI: 10.1109/MMUL.2021.3080305

关键词

Feature extraction; Convolution; Emotion recognition; Data mining; Face recognition; Visualization; Training; Multimodal Learning; Emotion Recognition; Sentiment Analysis; End-to-End Learning; Affective Computing

类别

Computer Science, Hardware & Architecture Computer Science, Information Systems Computer Science, Software Engineering Computer Science, Theory & Methods

资金

Basic Science Research Program through the National Research Foundation of Korea (NRF) - Ministry of Education [NRF-2018R1D1A3A03000947, NRF-2020R1A4A1019191]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This work introduces an approach for emotion recognition in videos by combining visual, audio, and language information, utilizing a lightweight feature extractor, attention strategy, and adaptive loss. The use of temporal convolutional network, attention mechanism, and adaptive loss during training significantly improves the performance in emotion recognition on a large dataset.

This work presents an approach for emotion recognition in video through the interaction of visual, audio, and language information in an end-to-end learning manner with three key points: 1) lightweight feature extractor, 2) attention strategy, and 3) adaptive loss. We proposed a lightweight deep architecture with approximately 1 MB, which for the most crucial part, accounts for feature extraction, in the emotion recognition systems. The relationship in regard to the time dimension of features is explored with temporal convolutional network instead of RNNs-based architecture to leverage the parallelism and avoid the challenge of vanishing gradient. The attention strategy is employed to adjust the knowledge of temporal networks based on the time dimension and learning of each modality's contribution to the final results. The interaction between the modalities is also investigated when training with adaptive objective function, which adjusts the network's gradient. The experimental results obtained on a large-scale dataset for emotion recognition on Koreans demonstrate the superiority of our method when employing attention mechanism and adaptive loss during training.

End-to-End Learning for Multimodal Emotion Recognition in Video With Adaptive Loss

期刊

IEEE MULTIMEDIA

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

End-to-End Learning for Multimodal Emotion Recognition in Video With Adaptive Loss

期刊

IEEE MULTIMEDIA

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文