☆ 4.6 Article

Domain Invariant Feature Learning for Speaker-Independent Speech Emotion Recognition

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2022)

期刊

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

卷 30, 期 -, 页码 2217-2230

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TASLP.2022.3178232

关键词

Training; Representation learning; Emotion recognition; Databases; Training data; Speech recognition; Feature extraction; Speech emotion recognition; speaker independent; adversarial learning; unsupervised domain adaptation; multi-source domain adaptation

类别

Acoustics Engineering, Electrical & Electronic

资金

National Natural Science Foundation of China (NSFC) [U2003207, 61921004, 61902064, 62076195]
Jiangsu Frontier Technology Basic Research Project [BK20192004]
Zhishan Young Scholarship of Southeast University
Scientific Research Foundation of Graduate School of Southeast University [YBPY1955]
German Research Foundation (DFG) [442218748]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this paper, a novel domain invariant feature learning method is proposed for speaker-independent speech emotion recognition. The proposed method eliminates domain shifts caused by different speakers and learns speaker-invariant emotion features. Experimental results demonstrate the superiority of the proposed method in SER performance.

In this paper, we propose a novel domain invariant feature learning (DIFL) method to deal with speaker-independent speech emotion recognition (SER). The basic idea of DIFL is to learn the speaker-invariant emotion feature by eliminating domain shifts between the training and testing data caused by different speakers from the perspective of multi-source unsupervised domain adaptation (UDA). Specifically, we embed a hierarchical alignment layer with the strong-weak distribution alignment strategy into the feature extraction block to firstly reduce the discrepancy in feature distributions of speech samples across different speakers as much as possible. Furthermore, multiple discriminators in the discriminator block are utilized to confuse the speaker information of emotion features both inside the training data and between the training and testing data. Through them, a multi-domain invariant representation of emotional speech can be gradually and adaptively achieved by updating network parameters. We conduct extensive experiments on three public datasets, i. e., Emo-DB, eNTERFACE, and CASIA, to evaluate the SER performance of the proposed method, respectively. The experimental results show that the proposed method is superior to the state-of-the-art methods.

Domain Invariant Feature Learning for Speaker-Independent Speech Emotion Recognition

期刊

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Domain Invariant Feature Learning for Speaker-Independent Speech Emotion Recognition

期刊

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文