4.6 Article

Using DTW neural-based MFCC warping to improve emotional speech recognition

Journal

NEURAL COMPUTING & APPLICATIONS
Volume 21, Issue 7, Pages 1765-1773

Publisher

SPRINGER LONDON LTD
DOI: 10.1007/s00521-011-0620-8

Keywords

Emotion; Speech recognition; Frequency warping; Dynamic time warping; Neural network

Ask authors/readers for more resources

In recognition of emotional speech, the performance of automatic speech recognition (ASR) systems is degraded significantly. To improve the recognition rate of ASR systems, we can neutralize the Mel-frequency cepstral coefficients (MFCCs) of emotional speech as the most frequently used features in ASR. In this way, the neutralized MFCCs are used in a hidden Markov model (HMM)-based ASR system that has been trained by nonemotional speech. In this paper, the frequency range that is most affected by emotion is determined, and the frequency warping is applied in the calculation process of MFCCs. This warping is performed in Mel filterbank module and/or discrete cosine transform (DCT) module in the process of MFCCs' calculation. To determine the warping factor, a combined structure using dynamic time warping (DTW) technique and multi-layer perceptron (MLP) neural network is used. Experimental results show that the recognition rate in anger and happiness emotional states is improved when the warping is performed in each of the mentioned modules when the MFCCs are calculated. Also, when the warping is performed in both the Mel filterbank and the DCT modules, the recognition rate of speech in anger and happiness emotional states is improved by 6.4 and 3.0%, respectively.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available