☆ 4.6 Article

Speech Emotion Recognition Based on Self-Attention Weight Correction for Acoustic and Text Features

IEEE ACCESS (2022)

期刊

IEEE ACCESS

卷 10, 期 -, 页码 115732-115743

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/ACCESS.2022.3219094

关键词

Feature extraction; Speech recognition; Acoustics; Emotion recognition; Data mining; Text recognition; Speech emotion recognition; confidence measure; automatic speech recognition; self-attention mechanism

类别

Computer Science, Information Systems Engineering, Electrical & Electronic Telecommunications

资金

JST SPRING [JPMJSP2124]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Research focuses on improving SER performance with BLSTM and self-attention, using the SAWC method to adjust the importance weights of segments and words with high ASR error probability, achieving higher accuracy in experiments.

Speech emotion recognition (SER) is essential for understanding a speaker's intention. Recently, some groups have attempted to improve SER performance using a bidirectional long short-term memory (BLSTM) to extract features from speech sequences and a self-attention mechanism to focus on the important parts of the speech sequences. SER also benefits from combining the information in speech with text, which can be accomplished automatically using an automatic speech recognizer (ASR), further improving its performance. However, ASR performance deteriorates in the presence of emotion in speech. Although there is a method to improve ASR performance in the presence of emotional speech, it requires the fine-tuning of ASR, which has a high computational cost and leads to the loss of cues important for determining the presence of emotion in speech segments, which can be helpful in SER. To solve these problems, we propose a BLSTM-and-self-attention-based SER method using self-attention weight correction (SAWC) with confidence measures. This method is applied to acoustic and text feature extractors in SER to adjust the importance weights of speech segments and words with a high possibility of ASR error. Our proposed SAWC reduces the importance of words with speech recognition error in the text feature while emphasizing the importance of speech segments containing these words in acoustic features. Our experimental results on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) dataset reveal that our proposed method achieves a weighted average accuracy of 76.6%, outperforming other state-of-the-art methods. Furthermore, we investigated the behavior of our proposed SAWC in each of the feature extractors.

Speech Emotion Recognition Based on Self-Attention Weight Correction for Acoustic and Text Features

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Speech Emotion Recognition Based on Self-Attention Weight Correction for Acoustic and Text Features

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文