4.7 Article

Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques

Journal

KNOWLEDGE-BASED SYSTEMS
Volume 211, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.knosys.2020.106547

Keywords

Twine shuffle pattern; Speech emotion recognition; Classification; Bioacoustics; SVM

Ask authors/readers for more resources

An innovative method for speech emotion recognition is proposed in this study, utilizing cryptographic structure and iterative neighborhood component analysis to achieve high classification performance. Experimental results demonstrate that the method attained high classification accuracies on multiple public databases, with potential for application in large-scale databases and healthcare settings.
Speech emotion recognition is one of the challenging research issues in the knowledge-based system and various methods have been recommended to reach high classification capability. In order to achieve high classification performance in speech emotion recognition, a nonlinear multi-level feature generation model is presented by using cryptographic structure. The novelty of this work is the use of cryptographic structure called shuffle box for feature generation and iterative neighborhood component analysis to select the features. The proposed method has three main stages: (i) multi-level feature generation using Tunable Q wavelet transform (TQWT), (ii) twine shuffle pattern (twine-shufpat) for feature generation, and (iii) discriminative features are selected using iterative neighborhood component analysis (INCA) and classified. The TQWT is a multi-level wavelet transformation method used to generate high-level, medium-level, and low-level wavelet coefficients. The proposed twine-shuf-pat technique is used to extract the features from the decomposed wavelet coefficients. INCA feature selector is employed to select the clinically significant features. The performance of the obtained model is validated using four speech emotion public databases (RAVDESS Speech, Emo-DB (Berlin), SAVEE, and EMOVO). Our developed twine-shuf-pat and INCA based method yielded 87.43%, 90.09%, 84.79%, and 79.08% classification accuracies using RAVDESS, Emo-DB (Berlin), SAVEE and EMOVO corpora respectively with 10-fold cross-validation strategy. A mixed database is created from four public speech emotion databases which yielded 80.05% classification accuracy. Our obtained speech emotion model is ready to be tested with huge database and can be used in healthcare applications. (C) 2020 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available