4.5 Article

Facial Expression Recognition Based on Spatial-Temporal Fusion with Attention Mechanism

期刊

NEURAL PROCESSING LETTERS
卷 55, 期 5, 页码 6109-6124

出版社

SPRINGER
DOI: 10.1007/s11063-022-11129-5

关键词

Facial expression recognition; Spatial-temporal feature fusion; Attention mechanism; Decision fusion

向作者/读者索取更多资源

Facial expression recognition plays an important role in various applications, but not all extracted facial features are suitable. In this paper, a spatial-temporal fusion method with attention mechanism is proposed to improve accuracy. Experimental results show competitive performance compared to state-of-the-art methods.
Facial expression recognition (FER) plays an important role in human-computer interaction and has been introduced in fatigue detection, human-computer interactive games, social robot, teaching effect analysis, and so on. However, not all the features extracted from facial images are suitable for FER. Moreover, individual spatial features or temporal features have certain limitations in characterizing facial expressions. Therefore, it is necessary to extract effective features suitable for facial expression recognition and to leverage effective fusion method to improve the accuracy of FER. In this paper, we propose a facial expression recognition method based on spatial-temporal fusion with attention mechanism(STAFER) which is composed of the spatial feature extractor (SFE), the temporal feature extractor (TFE), and spatial-temporal fusion (STF). Firstly, a 10 layers neural network of pretrained VGG16 is applied to the backbone of SFE to extract the spatial features of facial expression. To filter out the features that are irrelevant of facial expression, the attention mechanism is applied to the first three convolutional blocks. Secondly, TFE is constructed with convolutional blocks and LSTM. Facial expression sequences are fed to the convolutional modules to extract low-level features, and then these features are fed into the LSTM to obtain temporal features. Finally, a decision-level fusion strategy is utilized to fuse spatial and temporal features. The experiemntal results demonstrated that our proposed method achieves an accuracy of 98.05% on CK+ and 88.34% on Oulu-CASIA, which is competitive in comparison with some state-of-art methods.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据