☆ 4.6 Article

Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions

SENSORS (2021)

期刊

SENSORS

卷 21, 期 13, 页码 -

出版社

MDPI

DOI: 10.3390/s21134399

关键词

cascaded DnCNN-CNN; speech emotion recognition; residual learning

类别

Chemistry, Analytical Engineering, Electrical & Electronic Instruments & Instrumentation

资金

National Research Foundation of Korea (NRF) - Korean government [2017S1A6A3A01078538]
National Research Foundation of Korea [2017S1A6A3A01078538] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The study introduces a cascaded denoising CNN-CNN architecture for emotion classification in Korean and German speech under noisy conditions. Experimental results show that the DnCNN-CNN outperforms the baseline CNN in overall accuracy, providing new insights into speech denoising and emotion recognition.

Convolutional neural networks (CNNs) are a state-of-the-art technique for speech emotion recognition. However, CNNs have mostly been applied to noise-free emotional speech data, and limited evidence is available for their applicability in emotional speech denoising. In this study, a cascaded denoising CNN (DnCNN)-CNN architecture is proposed to classify emotions from Korean and German speech in noisy conditions. The proposed architecture consists of two stages. In the first stage, the DnCNN exploits the concept of residual learning to perform denoising; in the second stage, the CNN performs the classification. The classification results for real datasets show that the DnCNN-CNN outperforms the baseline CNN in overall accuracy for both languages. For Korean speech, the DnCNN-CNN achieves an accuracy of 95.8%, whereas the accuracy of the CNN is marginally lower (93.6%). For German speech, the DnCNN-CNN has an overall accuracy of 59.3-76.6%, whereas the CNN has an overall accuracy of 39.4-58.1%. These results demonstrate the feasibility of applying the DnCNN with residual learning to speech denoising and the effectiveness of the CNN-based approach in speech emotion recognition. Our findings provide new insights into speech emotion recognition in adverse conditions and have implications for language-universal speech emotion recognition.

Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions

期刊

SENSORS

出版社

MDPI

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions

期刊

SENSORS

出版社

MDPI

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文