☆ 4.6 Article

Accent classification from an emotional speech in clean and noisy environments

MULTIMEDIA TOOLS AND APPLICATIONS (2023)

Journal

MULTIMEDIA TOOLS AND APPLICATIONS

Volume 82, Issue 3, Pages 3485-3508

Publisher

SPRINGER

DOI: 10.1007/s11042-022-13236-w

Keywords

Accent classification; Spectral features; Machine learning classifier; Emotion classification

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This study aims to build effective accent recognition systems based on emotional speech. By applying statistical aggregation functions on different features and conducting experiments using clean and noisy speech signals, it is found that some features perform well on noisy data while the robustness of others depends on whether there is noisy training data.

The performance of speech emotion recognition systems (SER) suffers when emotional speech is spoken in different accents. One possible solution to such a problem is to identify the accent beforehand and use this knowledge in the SER task. The present work is one of the novel attempts in this regard to build effective accent recognition systems based on emotional speech. In this regard, statistical aggregation functions (like mean, std, kurtosis, etc.) have been applied on frame-level feature representations such as perceptual linear prediction (PLP), log filterbank energies (LFBE), Mel frequency cepstral coefficients (MFCC), spectral subband centroid (SSC), constant-Q cepstral coefficients (CQCC), chroma vector and Mel frequency discrete wavelet coefficients (MFDWC) to obtain utterance-level features from CREMA-D, an emotional dataset. The performance of the features for different standard classifiers is obtained by conducting experiments using clean and noisy speech signals. Finally, the experimental results show that the SSC features perform well on noisy data only when it is trained with noisy data. On the other hand, the combined MFDWC features perform well on noisy data for both clean and noisy training data. This hints at the noise-robustness of this feature set. On the other hand, we can only say that SSC is conditionally robust. We hope this work will initiate a new line of research in emotion recognition.

Accent classification from an emotional speech in clean and noisy environments

Journal

MULTIMEDIA TOOLS AND APPLICATIONS

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Accent classification from an emotional speech in clean and noisy environments

Journal

MULTIMEDIA TOOLS AND APPLICATIONS

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper