期刊
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING
卷 14, 期 1, 页码 800-810出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TAFFC.2020.3027340
关键词
Dynamic facial expression recognition; 3D-Inception-ResNet; channel attention; spatial-temporal attention
In this article, an effective framework is proposed to address the challenge of capturing the dynamics of facial expression progression in video for facial expression recognition. It utilizes a C3D-based network architecture and a Spatial-Temporal and Channel Attention Module (STCAM) to extract spatial-temporal features and enhance them for more representative features. Experimental results demonstrate that the proposed method achieves better or comparable performance compared to the state-of-the-art approaches in dynamic facial expression recognition.
Capturing the dynamics of facial expression progression in video is an essential and challenging task for facial expression recognition (FER). In this article, we propose an effective framework to address this challenge. We develop a C3D-based network architecture, 3D-Inception-ResNet, to extract spatial-temporal features from the dynamic facial expression image sequence. A Spatial-Temporal and Channel Attention Module (STCAM) is proposed to explicitly exploit the holistic spatial-temporal and channel-wise correlations among the extracted features. Specifically, the proposed STCAM calculates a channel-wise and a spatial-temporal-wise attention map to enhance the features along the corresponding feature dimensions for more representative features. We evaluate our method on three popular dynamic facial expression recognition datasets, CK+, Oulu-CASIA, and MMI. Experimental results show that our method achieves better or comparable performance compared to the state-of-the-art approaches.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据