4.7 Article

Towards Contrastive Context-Aware Conversational Emotion Recognition

期刊

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING
卷 13, 期 4, 页码 1879-1891

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TAFFC.2022.3212994

关键词

Conversational emotion recognition; conversational context; semantic constraint; contrastive learning

资金

  1. Natural Science Foundation of Beijing [4222036]
  2. Huawei Technologies [TC20201228005]

向作者/读者索取更多资源

Conversational Emotion Recognition (CER) aims to classify the emotion of each utterance in a conversation. Current CER models may not sufficiently capture the effects of contextual factors. To address this issue, a semantic-guided contrastive context-aware CER method (C3ER) is proposed to enhance the accuracy and robustness of emotion recognition.
Conversational Emotion Recognition (CER) aims at classifying the emotion of each utterance in a conversation. For a target utterance, its emotion is jointly determined by multiple factors, such as conversation topics, emotion labels and intra/inter-speaker influences, in the conversational context of it. Then an important research question arises: can the effects of these contextual factors be sufficiently captured by the current CER models? To answer this question, we carry out an empirical study on four representative CER models by a context-replacement methodology. The results suggest that these models either exhibit a label-copying effect, or rely heavily on the intra/inter-speaker dependency structure within the conversation, but do not make a good use of the semantics carried by the conversational context. Thus, there is a high risk that they overfit certain single factors, yet lacking a holistic understanding of the semantic context. To tackle the problem, we propose a semantic-guided contrastive context-aware CER method, namely C3ER, to augment/regularize a backbone CER model, which can be any neural CER framework. Specifically, C3ER takes the hidden states of utterances from the CER model as input, extracts the contrast pairs consisting of relevant and irrelevant utterances to the conversational context of a target utterance, and uses contrastive learning to establish a soft semantic constraint between the target utterance and its context. It is then jointly trained with the main CER model, forcing the model to gain a semantic understanding of the context. Extensive experimental results show that C3ER can significantly boost the accuracy and improve the robustness of the representative CER models.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据