☆ 4.6 Article

Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora

ENTROPY (2022)

Journal

ENTROPY

Volume 24, Issue 9, Pages -

Publisher

MDPI

DOI: 10.3390/e24091250

Keywords

cross-corpus speech emotion recognition; speech emotion recognition; domain adaptation; transfer learning; subspace learning

Funding

National Natural Science Foundation of China (NSFC) [U2003207, 61902064]
Jiangsu Frontier Technology Basic Research Project [BK20192004]
Zhishan Young Scholarship of Southeast University

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper focuses on the challenging task of cross-corpus speech emotion recognition (SER). To tackle the feature distribution mismatch between labeled source and target speech samples from different emotion corpora, the authors propose a transfer subspace learning method called MDAR. By learning a projection matrix and incorporating a novel regularization term called MDA, the MDAR method achieves better performance than other state-of-the-art transfer learning methods in cross-corpus SER tasks.

In this paper, we focus on a challenging, but interesting, task in speech emotion recognition (SER), i.e., cross-corpus SER. Unlike conventional SER, a feature distribution mismatch may exist between the labeled source (training) and target (testing) speech samples in cross-corpus SER because they come from different speech emotion corpora, which degrades the performance of most well-performing SER methods. To address this issue, we propose a novel transfer subspace learning method called multiple distribution-adapted regression (MDAR) to bridge the gap between speech samples from different corpora. Specifically, MDAR aims to learn a projection matrix to build the relationship between the source speech features and emotion labels. A novel regularization term called multiple distribution adaption (MDA), consisting of a marginal and two conditional distribution-adapted operations, is designed to collaboratively enable such a discriminative projection matrix to be applicable to the target speech samples, regardless of speech corpus variance. Consequently, by resorting to the learned projection matrix, we are able to predict the emotion labels of target speech samples when only the source label information is given. To evaluate the proposed MDAR method, extensive cross-corpus SER tasks based on three different speech emotion corpora, i.e., EmoDB, eNTERFACE, and CASIA, were designed. Experimental results showed that the proposed MDAR outperformed most recent state-of-the-art transfer subspace learning methods and even performed better than several well-performing deep transfer learning methods in dealing with cross-corpus SER tasks.

Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora

Journal

ENTROPY

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora

Journal

ENTROPY

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper