☆ 4.6 Article

Using DTW neural-based MFCC warping to improve emotional speech recognition

NEURAL COMPUTING & APPLICATIONS (2012)

Journal

NEURAL COMPUTING & APPLICATIONS

Volume 21, Issue 7, Pages 1765-1773

Publisher

SPRINGER LONDON LTD

DOI: 10.1007/s00521-011-0620-8

Keywords

Emotion; Speech recognition; Frequency warping; Dynamic time warping; Neural network

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

In recognition of emotional speech, the performance of automatic speech recognition (ASR) systems is degraded significantly. To improve the recognition rate of ASR systems, we can neutralize the Mel-frequency cepstral coefficients (MFCCs) of emotional speech as the most frequently used features in ASR. In this way, the neutralized MFCCs are used in a hidden Markov model (HMM)-based ASR system that has been trained by nonemotional speech. In this paper, the frequency range that is most affected by emotion is determined, and the frequency warping is applied in the calculation process of MFCCs. This warping is performed in Mel filterbank module and/or discrete cosine transform (DCT) module in the process of MFCCs' calculation. To determine the warping factor, a combined structure using dynamic time warping (DTW) technique and multi-layer perceptron (MLP) neural network is used. Experimental results show that the recognition rate in anger and happiness emotional states is improved when the warping is performed in each of the mentioned modules when the MFCCs are calculated. Also, when the warping is performed in both the Mel filterbank and the DCT modules, the recognition rate of speech in anger and happiness emotional states is improved by 6.4 and 3.0%, respectively.

Using DTW neural-based MFCC warping to improve emotional speech recognition

Journal

NEURAL COMPUTING & APPLICATIONS

Publisher

SPRINGER LONDON LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Using DTW neural-based MFCC warping to improve emotional speech recognition

Journal

NEURAL COMPUTING & APPLICATIONS

Publisher

SPRINGER LONDON LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper