☆ 4.7 Article

Improving Cross-Corpus Speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG)

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING (2021)

期刊

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING

卷 12, 期 4, 页码 1055-1068

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TAFFC.2019.2916092

关键词

Speech recognition; Training; Emotion recognition; Task analysis; Data models; Testing; Convergence; Emotion recognition; cross-corpus; adversarial; domain generalization

类别

Computer Science, Artificial Intelligence Computer Science, Cybernetics

资金

National Science Foundation [CAREER-1651740]
National Institute of Mental Health [R01MH108610, R34MH100404]
Heinz C Prechter Bipolar Research Fund
Richard Tam Foundation at the University of Michigan

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Automatic speech emotion recognition provides computers with important context for user understanding. While current methods often fail when applied to unseen datasets, recent research has focused on adversarial methods to create more generalized representations of emotional speech. The introduced Adversarial Discriminative Domain Generalization (ADDoG) method improves cross-dataset generalization by iteratively moving representations learned for each dataset closer to one another.

Automatic speech emotion recognition provides computers with critical context to enable user understanding. While methods trained and tested within the same dataset have been shown successful, they often fail when applied to unseen datasets. To address this, recent work has focused on adversarial methods to find more generalized representations of emotional speech. However, many of these methods have issues converging, and only involve datasets collected in laboratory conditions. In this paper, we introduce Adversarial Discriminative Domain Generalization (ADDoG), which follows an easier to train meet in the middle approach. The model iteratively moves representations learned for each dataset closer to one another, improving cross-dataset generalization. We also introduce Multiclass ADDoG, or MADDoG, which is able to extend the proposed method to more than two datasets, simultaneously. Our results show consistent convergence for the introduced methods, with significantly improved results when not using labels from the target dataset. We also show how, in most cases, ADDoG and MADDoG can be used to improve upon baseline state-of-the-art methods when target dataset labels are added and in-the-wild data are considered. Even though our experiments focus on cross-corpus speech emotion, these methods could be used to remove unwanted factors of variation in other settings.

Improving Cross-Corpus Speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG)

期刊

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Improving Cross-Corpus Speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG)

期刊

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文