4.7 Article

DSTCNet: Deep Spectro-Temporal-Channel Attention Network for Speech Emotion Recognition

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNNLS.2023.3304516

Keywords

Channel attention; representation learning; spectro-temporal attention; speech emotion recognition (SER)

Ask authors/readers for more resources

This article introduces a deep spectro-temporal-channel network (DSTCNet) for speech emotion recognition, which improves the representation ability by integrating multiple spectro-temporal-channel attention modules. Experimental results show that DSTCNet outperforms traditional CNN-based methods and several state-of-the-art methods in emotion recognition.
Speech emotion recognition (SER) plays an important role in human-computer interaction, which can provide better interactivity to enhance user experiences. Existing approaches tend to directly apply deep learning networks to distinguish emotions. Among them, the convolutional neural network (CNN) is the most commonly used method to learn emotional representations from spectrograms. However, CNN does not explicitly model features' associations in the spectral-, temporal-, and channel-wise axes or their relative relevance, which will limit the representation learning. In this article, we propose a deep spectro-temporal-channel network (DSTCNet) to improve the representational ability for speech emotion. The proposed DSTCNet integrates several spectro-temporal-channel (STC) attention modules into a general CNN. Specifically, we propose the STC module that infers a 3-D attention map along the dimensions of time, frequency, and channel. The STC attention can focus more on the regions of crucial time frames, frequency ranges, and feature channels. Finally, experiments were conducted on the Berlin emotional database (EmoDB) and interactive emotional dyadic motion capture (IEMOCAP) databases. The results reveal that our DSTCNet can outperform the traditional CNN-based and several state-of-the-art methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available