4.6 Article

Improvement of emotion classification performance using multi-resolution variational mode decomposition method

期刊

出版社

ELSEVIER SCI LTD
DOI: 10.1016/j.bspc.2023.105708

关键词

Speech emotion recognition; Deep neural network; MRVMD; MRVMMFCC; MRVMAE; MRVMPE

向作者/读者索取更多资源

Automated speech emotion recognition has gained popularity due to its wide range of applications. Researchers have been using different methods to improve emotion recognition performance. This study proposes a method based on multi-resolution variational mode decomposition to extract features for emotion classification, achieving better accuracy compared to existing methods when combined with a deep neural network classifier.
Automated speech emotion recognition (SER) has been gaining popularity among researchers for three decades because of its vast number of applications in the real world. It is helpful to improve the relationship between humans and machines, online marketing and education, customer relations, medical treatment, safe driving, online search, etc. Researchers adopt various methods to improve emotion recognition performance from speech signals, like using different combinations of features (acoustic, non-acoustic, or both), classifiers (machine learning, deep learning, or both). Our study tried to improve emotion classification performance using features based on the multi-resolution variational mode decomposition (MRVMD) method. We first decompose each signal frame into several sub-signals known as modes or intrinsic mode functions (IMFs) using the MRVMD method. Then the proposed features, multi-resolution variational mode mel-frequency cepstral coefficient (MRVMMFCC), multi-resolution variational mode approximate entropy (MRVMAE), and multi-resolution variational mode permutation entropy (MRVMPE), were extracted using the MRVMD-decomposed IMF signals. Finally, different combinations of the proposed features are used to classify the emotion using a deep neural network (DNN) classifier. From the experimental results, we found that combination of the proposed feature (MRVMMFCC + MRVMAE + MRVMPE) performs better than the other combination in recognizing emotion using speech signals. The proposed feature combination with a DNN classifier achieved an emotion classification accuracy of 83.4%, 85.01%, and 90.51% for the SAVEE, EMOVO, and EMO-DB datasets, respectively. We found that the proposed MRVMD method performed better than the state-of-the-art methods.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据