4.7 Article

Improving Cross-Corpus Speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG)

期刊

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING
卷 12, 期 4, 页码 1055-1068

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TAFFC.2019.2916092

关键词

Speech recognition; Training; Emotion recognition; Task analysis; Data models; Testing; Convergence; Emotion recognition; cross-corpus; adversarial; domain generalization

资金

  1. National Science Foundation [CAREER-1651740]
  2. National Institute of Mental Health [R01MH108610, R34MH100404]
  3. Heinz C Prechter Bipolar Research Fund
  4. Richard Tam Foundation at the University of Michigan

向作者/读者索取更多资源

Automatic speech emotion recognition provides computers with important context for user understanding. While current methods often fail when applied to unseen datasets, recent research has focused on adversarial methods to create more generalized representations of emotional speech. The introduced Adversarial Discriminative Domain Generalization (ADDoG) method improves cross-dataset generalization by iteratively moving representations learned for each dataset closer to one another.
Automatic speech emotion recognition provides computers with critical context to enable user understanding. While methods trained and tested within the same dataset have been shown successful, they often fail when applied to unseen datasets. To address this, recent work has focused on adversarial methods to find more generalized representations of emotional speech. However, many of these methods have issues converging, and only involve datasets collected in laboratory conditions. In this paper, we introduce Adversarial Discriminative Domain Generalization (ADDoG), which follows an easier to train meet in the middle approach. The model iteratively moves representations learned for each dataset closer to one another, improving cross-dataset generalization. We also introduce Multiclass ADDoG, or MADDoG, which is able to extend the proposed method to more than two datasets, simultaneously. Our results show consistent convergence for the introduced methods, with significantly improved results when not using labels from the target dataset. We also show how, in most cases, ADDoG and MADDoG can be used to improve upon baseline state-of-the-art methods when target dataset labels are added and in-the-wild data are considered. Even though our experiments focus on cross-corpus speech emotion, these methods could be used to remove unwanted factors of variation in other settings.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据