4.7 Article

Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques

期刊

KNOWLEDGE-BASED SYSTEMS
卷 211, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.knosys.2020.106547

关键词

Twine shuffle pattern; Speech emotion recognition; Classification; Bioacoustics; SVM

向作者/读者索取更多资源

An innovative method for speech emotion recognition is proposed in this study, utilizing cryptographic structure and iterative neighborhood component analysis to achieve high classification performance. Experimental results demonstrate that the method attained high classification accuracies on multiple public databases, with potential for application in large-scale databases and healthcare settings.
Speech emotion recognition is one of the challenging research issues in the knowledge-based system and various methods have been recommended to reach high classification capability. In order to achieve high classification performance in speech emotion recognition, a nonlinear multi-level feature generation model is presented by using cryptographic structure. The novelty of this work is the use of cryptographic structure called shuffle box for feature generation and iterative neighborhood component analysis to select the features. The proposed method has three main stages: (i) multi-level feature generation using Tunable Q wavelet transform (TQWT), (ii) twine shuffle pattern (twine-shufpat) for feature generation, and (iii) discriminative features are selected using iterative neighborhood component analysis (INCA) and classified. The TQWT is a multi-level wavelet transformation method used to generate high-level, medium-level, and low-level wavelet coefficients. The proposed twine-shuf-pat technique is used to extract the features from the decomposed wavelet coefficients. INCA feature selector is employed to select the clinically significant features. The performance of the obtained model is validated using four speech emotion public databases (RAVDESS Speech, Emo-DB (Berlin), SAVEE, and EMOVO). Our developed twine-shuf-pat and INCA based method yielded 87.43%, 90.09%, 84.79%, and 79.08% classification accuracies using RAVDESS, Emo-DB (Berlin), SAVEE and EMOVO corpora respectively with 10-fold cross-validation strategy. A mixed database is created from four public speech emotion databases which yielded 80.05% classification accuracy. Our obtained speech emotion model is ready to be tested with huge database and can be used in healthcare applications. (C) 2020 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据