4.6 Article

TWACapsNet: a capsule network with two-way attention mechanism for speech emotion recognition

期刊

SOFT COMPUTING
卷 -, 期 -, 页码 -

出版社

SPRINGER
DOI: 10.1007/s00500-023-08957-5

关键词

Speech emotion recognition; Attention mechanism; Neural networks

向作者/读者索取更多资源

This study proposes a Capsule Network with Two-Way Attention Mechanism (TWACapsNet) for the Speech Emotion Recognition (SER) problem. Experimental results demonstrate that the proposed method outperforms other neural network models on multiple SER datasets, and the combination of the two ways contributes to the higher and more stable performance of TWACapsNet.
Speech Emotion Recognition (SER) is a challenging task, and the typical convolutional neural network (CNN) cannot well handle the speech data directly. Because CNN tends to understand local information and ignores the overall characteristics. This paper proposes a Capsule Network with Two-Way Attention MechanismTWACapsNet for short) for the SER problem. TWACapsNet accepts the spatial and spectral features as inputs, and the convolutional layer and the capsule layer are deployed to process these two types of features in two ways separately. After that, two attention mechanisms are designed to enhance the information obtained from the spatial and spectral features. Finally, the results of these two ways are combined to form the final decision. The advantage of TWACapsNet is verified by experiments on multiple SER data sets, and experimental results show that the proposed method outperforms the widely-deployed neural network models on three typical SER data sets. Furthermore, the combination of the two ways contributes to the higher and more stable performance of TWACapsNet.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据