☆ 4.7 Article

EMOLIPS: Towards Reliable Emotional Speech Lip-Reading

MATHEMATICS (2023)

期刊

MATHEMATICS

卷 11, 期 23, 页码 -

出版社

MDPI

DOI: 10.3390/math11234787

关键词

lip-reading; visual speech recognition; emotional speech; speech to text; affective computing; deep learning; machine learning; human-computer interaction

类别

Mathematics

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This article presents a novel approach called EMOLIPS for emotional speech lip-reading. It utilizes visual data processing and deep learning techniques for speech to text recognition. By using trained emotional lip-reading models, the approach successfully addresses the issue of multi-emotional lip-reading in real-life scenarios. Experimental results show a significant improvement in phrase recognition accuracy.

In this article, we present a novel approach for emotional speech lip-reading (EMOLIPS). This two-level approach to emotional speech to text recognition based on visual data processing is motivated by human perception and the recent developments in multimodal deep learning. The proposed approach uses visual speech data to determine the type of speech emotion. The speech data are then processed using one of the emotional lip-reading models trained from scratch. This essentially resolves the multi-emotional lip-reading issue associated with most real-life scenarios. We implemented these models as a combination of EMO-3DCNN-GRU architecture for emotion recognition and 3DCNN-BiLSTM architecture for automatic lip-reading. We evaluated the models on the CREMA-D and RAVDESS emotional speech corpora. In addition, this article provides a detailed review of recent advances in automated lip-reading and emotion recognition that have been developed over the last 5 years (2018-2023). In comparison to existing research, we mainly focus on the valuable progress brought with the introduction of deep learning to the field and skip the description of traditional approaches. The EMOLIPS approach significantly improves the state-of-the-art accuracy for phrase recognition due to considering emotional features of the pronounced audio-visual speech up to 91.9% and 90.9% for RAVDESS and CREMA-D, respectively. Moreover, we present an extensive experimental investigation that demonstrates how different emotions (happiness, anger, disgust, fear, sadness, and neutral), valence (positive, neutral, and negative) and binary (emotional and neutral) affect automatic lip-reading.

EMOLIPS: Towards Reliable Emotional Speech Lip-Reading

期刊

MATHEMATICS

出版社

MDPI

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

EMOLIPS: Towards Reliable Emotional Speech Lip-Reading

期刊

MATHEMATICS

出版社

MDPI

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文