4.7 Article

Attention guided 3D CNN-LSTM model for accurate speech based emotion recognition

期刊

APPLIED ACOUSTICS
卷 182, 期 -, 页码 -

出版社

ELSEVIER SCI LTD
DOI: 10.1016/j.apacoust.2021.108260

关键词

Speech emotion recognition; Attention; 3D CNN-LSTM model

向作者/读者索取更多资源

This paper proposes a novel approach based on attention guided 3D convolutional neural networks (CNN)-long short-term memory (LSTM) model for speech based emotion recognition. The method is evaluated using three datasets and outperforms compared methods according to classification accuracy, sensitivity, specificity, and F1-score evaluations.
In this paper, a novel approach, which is based on attention guided 3D convolutional neural networks (CNN)-long short-term memory (LSTM) model, is proposed for speech based emotion recognition. The proposed attention guided 3D CNN-LSTM model is trained in end-to-end fashion. The input speech signals are initially resampled and pre-processed for noise removing and emphasizing the high frequencies. Then, spectrogram, Mel-frequency cepstral coefficient (MFCC), cochleagram and fractal dimension methods are used to convert the input speech signals into the speech images. The obtained images are concatenated into four-dimensional volumes and used as input to the developed 28 layered attention integrated 3D CNN-LSTM model. In the 3D CNN-LSTM model, there are six 3D convolutional layers, two batch normalization (BN) layers, five Rectified Linear Unit (ReLu) layers, three 3D max pooling layers, one attention, one LSTM, one flatten and one dropout layers, and two fully connected layers. The attention layer is connected to the 3D convolution layers. Three datasets namely Ryerson Audio-Visual Database of Emotional Speech (RAVDESS), RML and SAVEE are used in the experimental works. Besides, the mixture of these datasets is also used in the experimental works. Classification accuracy, sensitivity, specificity and F1-score are used for evaluation of the developed method. The obtained results are also compared with some of the recently published results and it is seen that the proposed method outperforms the compared methods. (C) 2021 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据