4.5 Article

Learning Class-Aligned and Generalized Domain-Invariant Representations for Speech Emotion Recognition

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TETCI.2020.2972926

关键词

Domain adversarial training; generalization; class alignment; speech emotion recognition

资金

  1. National Science Foundation of China [61772188]
  2. National Key R&D Program of China [2018YFC0831800]

向作者/读者索取更多资源

Although recent research on speech emotion recognition has demonstrated that learning domain-invariant features provide an elegant solution to domain mismatch, the features learned by the existing methods lack generalization capabilities to capture latent information from datasets. We propose two novel domain adaptation methods, the generalized domain adversarial neural network (GDANN) and the class-aligned GDANN (CGDANN), to learn generalized domain-invariant representations for emotion recognition. GDANN and CGDANN, which are derived from multitask learning (MTL), consist of three tasks. The main task is to recognize the emotional category to which the input belongs. The remaining two tasks are auxiliary tasks. One is to use a variational autoencoder to model the input distribution, which encourages the model to learn the distribution of latent representations. The other is to learn the common representations of different domains, for which distinguishing via the domain classifier is difficult. The gradient of the domain classifier guides the shared representations of the source and target domains to approximate each other using a gradient reversal layer. To evaluate the effectiveness of the proposed methods, we conduct several experiments with the IEMOCAP and MSP-IMPROV datasets. The results illustrate that good performance is achieved compared with that of state-of-the-art methods. Notably, CGDANN utilizes a small quantity of labeled target domain samples to align the distribution representation and obtains the hest performance among the comparison methods. We further visualize the representations learned by the proposed methods and discover that the representations of the source and target domains converge with a low variance.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据