☆ 4.7 Article

MES-P: An Emotional Tonal Speech Dataset in Mandarin with Distal and Proximal Labels

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING (2022)

Journal

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING

Volume 13, Issue 1, Pages 408-425

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TAFFC.2019.2945322

Keywords

Emotional speech; Mandarin; dataset; distal labels; proximal labels; tonal speech; emotion intensities

Funding

National Natural Science Foundation of China [61906128]
National Natural Science Foundation of Jiangsu Province [BK20180834]
French Research Agency, l'Agence Nationale de Recherche (ANR) [ANR-13-CORD-0004-02]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

In this paper, a Mandarin Chinese emotional tonal speech dataset MES-P is proposed, which includes both distal and proximal labels. The dataset allows studying human emotional intelligence and emotional misunderstandings in real life. It captures the features of tonal languages and provides emotional speech samples matching the tonal distribution in Mandarin. The dataset also features emotion intensity variations and shows high consistency between emotional intentions and perceptions.

Emotion shapes all aspects of our interpersonal and intellectual experiences. Its automatic analysis has therefore many applications. In this paper, we propose an emotional tonal speech dataset, Mandarin Chinese Emotional Speech Dataset-Portrayed (MES-P), with both distal and proximal labels. In contrast with state of the art datasets which only focused on perceived emotions, MES-P includes not only perceived emotions (proximal labels) but also intended emotions (distal labels), to make it possible to study human emotional intelligence, i.e., emotion expression/understanding ability, and emotional misunderstandings in real life. Furthermore, MES-P also captures a main feature of tonal languages, and provides emotional speech samples matching the tonal distribution in real life Mandarin. MES-P dataset also features emotion intensity variations, by introducing both moderate and intense versions for joy, anger, and sadness, in addition to neutral. Ratings of the collected speech samples are made in valence-arousal space through continuous coordinate locations, resulting in an emotional distribution pattern in 2D VA space. High consistency between the speakers emotional intentions and the listeners perceptions is also proved by Cohens Kappa coefficients. Finally, extensive experiments are carried out as a baseline on MES-P for automatic emotion recognition and with comparison to human emotion intelligence.

MES-P: An Emotional Tonal Speech Dataset in Mandarin with Distal and Proximal Labels

Journal

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

MES-P: An Emotional Tonal Speech Dataset in Mandarin with Distal and Proximal Labels

Journal

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper