☆ 4.6 Article

A BiLSTM-Transformer and 2D CNN Architecture for Emotion Recognition from Speech

ELECTRONICS (2023)

期刊

ELECTRONICS

卷 12, 期 19, 页码 -

出版社

MDPI

DOI: 10.3390/electronics12194034

关键词

emotion recognition from speech; transformer; attention mechanism; bidirectional LSTM; convolutional neural network; audio feature extraction

类别

Computer Science, Information Systems Engineering, Electrical & Electronic Physics, Applied

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The significance of emotion recognition technology continues to grow, and this study proposes a new model architecture that combines BiLSTM-Transformer and 2D CNN to enhance the efficacy of emotion recognition from speech. The results show high accuracy rates in two major emotion recognition databases.

The significance of emotion recognition technology is continuing to grow, and research in this field enables artificial intelligence to accurately understand and react to human emotions. This study aims to enhance the efficacy of emotion recognition from speech by using dimensionality reduction algorithms for visualization, effectively outlining emotion-specific audio features. As a model for emotion recognition, we propose a new model architecture that combines the bidirectional long short-term memory (BiLSTM)-Transformer and a 2D convolutional neural network (CNN). The BiLSTM-Transformer processes audio features to capture the sequence of speech patterns, while the 2D CNN handles Mel-Spectrograms to capture the spatial details of audio. To validate the proficiency of the model, the 10-fold cross-validation method is used. The methodology proposed in this study was applied to Emo-DB and RAVDESS, two major emotion recognition from speech databases, and achieved high unweighted accuracy rates of 95.65% and 80.19%, respectively. These results indicate that the use of the proposed transformer-based deep learning model with appropriate feature selection can enhance performance in emotion recognition from speech.

A BiLSTM-Transformer and 2D CNN Architecture for Emotion Recognition from Speech

期刊

ELECTRONICS

出版社

MDPI

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A BiLSTM-Transformer and 2D CNN Architecture for Emotion Recognition from Speech

期刊

ELECTRONICS

出版社

MDPI

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文