Journal
Publisher
ISCA-INT SPEECH COMMUNICATION ASSOC
DOI: 10.21437/Interspeech.2016-998
Keywords
emotion recognition; spontaneous speech; additive and convolutional noises; feature enhancement; autoencoder; LSTM Neural Networks
Categories
Funding
- EC's 7th Framework Programme through the ERC Starting Grant [338164]
- EU's Horizon Programme [644632, 645094, 645378]
- German Federal Ministry of Education, Science, Research and Technology (BMBF) [16SV7213]
Ask authors/readers for more resources
During the last decade, speech emotion recognition technology has matured well enough to be used in some real-life scenarios. However, these scenarios require an almost silent environment to not compromise the performance of the system. Emotion recognition technology from speech thus needs to evolve and face more challenging conditions, such as environmental additive and convolutional noises, in order to broaden its applicability to real-life conditions. This contribution evaluates the impact of a front-end feature enhancement method based on an autoencoder with long short-term memory neural networks, for robust emotion recognition from speech. Support Vector Regression is then used as a back-end for time- and value-continuous emotion prediction from enhanced features. We perform extensive evaluations on both non-stationary additive noise and convolutional noise, on a database of spontaneous and natural emotions. Results show that the proposed method significantly outperforms a system trained on raw features, for both arousal and valence dimensions, while having almost no degradation when applied to clean speech.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available