☆ 4.7 Article

GFFT: Global-local feature fusion transformers for facial expression recognition in the wild

IMAGE AND VISION COMPUTING (2023)

期刊

IMAGE AND VISION COMPUTING

卷 139, 期 -, 页码 -

出版社

ELSEVIER

DOI: 10.1016/j.imavis.2023.104824

关键词

Facial expression recognition; Cross-patch communication; Self-attention mechanism; Transformers

类别

Computer Science, Artificial Intelligence Computer Science, Software Engineering Computer Science, Theory & Methods Engineering, Electrical & Electronic Optics

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this paper, the Global-local Feature Fusion Transformers (GFFT) method is proposed to address the challenges of facial expression recognition in the wild, such as facial occlusion and pose variation. The GFFT approach utilizes self-attentive fusion to enable cross-patch communication between features, effectively capturing global and local information. Experimental results demonstrate the effectiveness and robustness of GFFT, outperforming existing state-of-the-art methods on multiple datasets.

Facial expression recognition in the wild has become more challenging owing to various unconstrained conditions, such as facial occlusion and pose variation. Previous methods usually recognize expressions by holistic or relatively coarse local methods, but only capture limited features and are susceptible to be influenced. In this paper, we propose the Global-local Feature Fusion Transformers (GFFT) that is centered on cross-patch communication between features by self-attentive fusion. This method solves the problems of facial occlusion and pose variation effectively. Firstly, the Global Contextual Information Perception (GCIP) is designed to fuse global and local features, learning the relationship between them. Subsequently, the Facial Salient Feature Perception (FSFP) module is proposed to guide the fusion features to understand the key regions of facial features using facial landmark features to further capture face-related salient features. In addition, the Multi-scale Feature Fusion (MFF) is constructed to combine different stages of fusion features to reduce the sensitivity of the deep network to facial occlusion. Extensive experiments show that our GFFT outperforms existing state-of-the-art methods with 92.05% on RAF-DB, 67.46% on AffectNet-7, 63.62% on AffectNet-8, and 91.04% on FERPlus, demonstrating its effectiveness and robustness.

GFFT: Global-local feature fusion transformers for facial expression recognition in the wild

期刊

IMAGE AND VISION COMPUTING

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

GFFT: Global-local feature fusion transformers for facial expression recognition in the wild

期刊

IMAGE AND VISION COMPUTING

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文