☆ 4.7 Article

Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING (2023)

期刊

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING

卷 14, 期 3, 页码 1912-1926

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TAFFC.2022.3167013

关键词

Training; Adaptation models; Emotion recognition; Speech recognition; Australia; Task analysis; Generators; Speech emotion recognition; self-supervised learning; domain adaptation; adversarial learning

类别

Computer Science, Artificial Intelligence Computer Science, Cybernetics

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Despite recent advancements in speech emotion recognition (SER) within a single corpus, the performance of these systems degrades significantly for cross-corpus and cross-language scenarios. This is due to the lack of generalization in SER systems towards unseen conditions. Adversarial methods have been used to address this issue, but many only focus on cross-corpus SER and ignore the cross-language performance degradation. This study proposes an adversarial dual discriminator (ADDi) network and a self-supervised ADDi (sADDi) network to improve cross-corpus and cross-language SER without requiring target data labels. Experimental results demonstrate improved performance compared to state-of-the-art methods.

Despite the recent advancement in speech emotion recognition (SER) within a single corpus setting, the performance of these SER systems degrades significantly for cross-corpus and cross-language scenarios. The key reason is the lack of generalisation in SER systems towards unseen conditions, which causes them to perform poorly in cross-corpus and cross-language settings. Recent studies focus on utilising adversarial methods to learn domain generalised representation for improving cross-corpus and cross-language SER to address this issue. However, many of these methods only focus on cross-corpus SER without addressing the cross-language SER performance degradation due to a larger domain gap between source and target language data. This contribution proposes an adversarial dual discriminator (ADDi) network that uses the three-players adversarial game to learn generalised representations without requiring any target data labels. We also introduce a self-supervised ADDi (sADDi) network that utilises self-supervised pre-training with unlabelled data. We propose synthetic data generation as a pretext task in sADDi, enabling the network to produce emotionally discriminative and domain invariant representations and providing complementary synthetic data to augment the system. The proposed model is rigorously evaluated using five publicly available datasets in three languages and compared with multiple studies on cross-corpus and cross-language SER. Experimental results demonstrate that the proposed model achieves improved performance compared to the state-of-the-art methods.

Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition

期刊

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition

期刊

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文