☆ 4.7 Article

Quantifying Emotional Similarity in Speech

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING (2023)

期刊

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING

卷 14, 期 2, 页码 1376-1390

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TAFFC.2021.3127390

关键词

Task analysis; Emotion recognition; Speech recognition; Affective computing; Face recognition; Measurement; Reliability; Speech emotion recognition; ordinal affective computing; representation learning of emotion similarity; triplet loss function; speech emotion retrieval

类别

Computer Science, Artificial Intelligence Computer Science, Cybernetics

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study proposes a new formula for measuring emotional similarity between speech recordings. Instead of predicting emotional attributes or recognizing emotional categories, this formulation explores the ordinal nature of emotions by comparing emotional similarities. The study addresses questions about which emotional descriptors provide the most suitable space to assess emotional similarities and whether deep neural networks can learn representations to quantify emotional similarities robustly. By creating alternative emotional spaces using attribute-based descriptors and categorical emotions, the study shows that a meaningful embedding can be learned to assess emotional similarities, outperforming human evaluators in the same task.

This study proposes the novel formulation of measuring emotional similarity between speech recordings. This formulation explores the ordinal nature of emotions by comparing emotional similarities instead of predicting an emotional attribute, or recognizing an emotional category. The proposed task determines which of two alternative samples has the most similar emotional content to the emotion of a given anchor. This task raises some interesting questions. Which is the emotional descriptor that provide the most suitable space to assess emotional similarities? Can deep neural networks (DNNs) learn representations to robustly quantify emotional similarities? We address these questions by exploring alternative emotional spaces created with attribute-based descriptors and categorical emotions. We create the representation using a DNN trained with the triplet loss function, which relies on triplets formed with an anchor, a positive example, and a negative example. We select a positive sample that has similar emotion content to the anchor, and a negative sample that has dissimilar emotion to the anchor. The task of our DNN is to identify the positive sample. The experimental evaluations demonstrate that we can learn a meaningful embedding to assess emotional similarities, achieving higher performance than human evaluators asked to complete the same task.

Quantifying Emotional Similarity in Speech

期刊

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Quantifying Emotional Similarity in Speech

期刊

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文