☆ 4.6 Article

Emotion classification from speech signal based on empirical mode decomposition and non-linear features Speech emotion recognition

COMPLEX & INTELLIGENT SYSTEMS (2021)

Journal

COMPLEX & INTELLIGENT SYSTEMS

Volume 7, Issue 4, Pages 1919-1934

Publisher

SPRINGER HEIDELBERG

DOI: 10.1007/s40747-021-00295-z

Keywords

Speech signal; Emotion perception; Entropy measures; Linear discriminant analysis; Empirical mode decomposition

Funding

Scientific Research Grant of Shantou University, China [NTF17016]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper investigates the recognition of seven emotional states from speech signals using entropy-based feature extraction and various classifiers, achieving promising results on the Toronto Emotional Speech dataset.

Emotion recognition system from speech signal is a widely researched topic in the design of the Human-Computer Interface (HCI) models, since it provides insights into the mental states of human beings. Often, it is required to identify the emotional condition of the humans as cognitive feedback in the HCI. In this paper, an attempt to recognize seven emotional states from speech signals, known as sad, angry, disgust, happy, surprise, pleasant, and neutral sentiment, is investigated. The proposed method employs a non-linear signal quantifying method based on randomness measure, known as the entropy feature, for the detection of emotions. Initially, the speech signals are decomposed into Intrinsic Mode Function (IMF), where the IMF signals are divided into dominant frequency bands such as the high frequency, mid-frequency , and base frequency. The entropy measures are computed directly from the high-frequency band in the IMF domain. However, for the mid- and base-band frequencies, the IMFs are averaged and their entropy measures are computed. A feature vector is formed from the computed entropy measures incorporating the randomness feature for all the emotional signals. Then, the feature vector is used to train a few state-of-the-art classifiers, such as Linear Discriminant Analysis (LDA), Naive Bayes, K-Nearest Neighbor, Support Vector Machine, Random Forest, and Gradient Boosting Machine. A tenfold cross-validation, performed on a publicly available Toronto Emotional Speech dataset, illustrates that the LDA classifier presents a peak balanced accuracy of 93.3%, F1 score of 87.9%, and an area under the curve value of 0.995 in the recognition of emotions from speech signals of native English speakers.

Emotion classification from speech signal based on empirical mode decomposition and non-linear features Speech emotion recognition

Journal

COMPLEX & INTELLIGENT SYSTEMS

Publisher

SPRINGER HEIDELBERG

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Emotion classification from speech signal based on empirical mode decomposition and non-linear features Speech emotion recognition

Journal

COMPLEX & INTELLIGENT SYSTEMS

Publisher

SPRINGER HEIDELBERG

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper