☆ 4.7 Article

Emotional speech Recognition using CNN and Deep learning techniques

APPLIED ACOUSTICS (2023)

期刊

APPLIED ACOUSTICS

卷 211, 期 -, 页码 -

出版社

ELSEVIER SCI LTD

DOI: 10.1016/j.apacoust.2023.109492

关键词

Emotional speech recognition; CNN; Deep learning; MFCC

类别

Acoustics

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Emotions are an important part of human life and can be expressed through speech. Speech Emotion Recognition (SER) systems can extract and predict the emotional tone of a speaker using audio signals. Emotions can be categorized into types such as anger, happiness, sadness, and neutral, and a system can be developed based on this criterion with proper training and resources.

Emotions are considered an integral and vital part of human life. It acts as a means of expressing one's opinions and informing others about one's bodily and emotional well-being. The Speech Emotion Recognition (SER) system is used to extract and predict the emotional tone of a speaker through audio signals. Universally, emotions can be often categorized and grouped into mere types like Anger, Happiness, Sadness, and even Neutral emotional state. One can then use this as a criterion to develop a system provided, they are equipped with finite and required resources. Then through proper training and procedure, the system can be made to recognize the speaker's emotional state. In these situations, speech emotion detection is performed using spectral and prosodic capabilities because each of those aspects includes the essential data to determine the speaker's emotional state. Mel-frequency Cepstral Coefficients (MFCC) are one of the spectral features that can be used to detect the speaker's emotional state. Frequency, loudness, and pitch of a sound signal are some of the many available prosodic features that would then be able to help train and construct a machine-learning model that can uniquely identify and differentiate between the underlying emotions present in a given speech signal. It is known that the pitch of an audio signal is unique and can be used as a measure to easily distinguish between different audio signals. They can be detected from the selected features, allowing us to classify the gender type of the speaker. Support Vector Machines (SVM) are supervised learning models that examine the data for regression and classification in machine learning. They generally identify and classify the gender of the speaker in the case of a Speech Emotion Recognition system. Furthermore, some studies rely on Radial-Basis Function (RBF) and Back Propagation networks. They are capable enough to be used to identify and recognize the human emotions in the signal constrained on specific and selected features. This study presents a Speech Emotion Recognition system that outperforms an existing system in terms of data, feature selection, and methodology. Its goal is to identify speech precepts based on emotions more accurately, on average 78% accurately, and with fewer false positives. The paper aims to discuss the importance of emotions in human communication and introduce the concept of Speech Emotion Recognition, which extracts and predicts the emotional tone of a speaker through audio signals. It explains how spectral and prosodic features can be used to identify the emotional state of the speaker, with MFCC being one of the commonly used spectral features. Prosodic features such as pitch, loudness, and frequency can also be used to identify gender and differentiate between emotions in speech signals. SVM, RBF, and Back Propagation networks are machine learning models used in speech emotion recognition; these more advanced models outperform existing systems in data collection and feature selection methodology, achieving an accuracy rate of 78% with fewer false positives.& COPY; 2023 Elsevier Ltd. All rights reserved.

Emotional speech Recognition using CNN and Deep learning techniques

期刊

APPLIED ACOUSTICS

出版社

ELSEVIER SCI LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Emotional speech Recognition using CNN and Deep learning techniques

期刊

APPLIED ACOUSTICS

出版社

ELSEVIER SCI LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文