☆ 4.7 Article

Patch attention convolutional vision transformer for facial expression recognition with occlusion

INFORMATION SCIENCES (2023)

Journal

INFORMATION SCIENCES

Volume 619, Issue -, Pages 781-794

Publisher

ELSEVIER SCIENCE INC

DOI: 10.1016/j.ins.2022.11.068

Keywords

Facial expression recognition; Occlusion; Local and global feature; Self-attention; Vision transformer

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

A Patch Attention Convolutional Vision Transformer (PACVT) is proposed to tackle the occlusion problem in Facial Expression Recognition (FER). It extracts local and global features from facial patches and uses self-attention to focus on salient patches with discriminative features. Experimental results demonstrate the superiority of PACVT in occlusion FER.

Despite substantial progress in Facial Expression Recognition (FER) in recent decades, most previous methods have been developed to recognize constrained facial expressions. Realworld occlusions lead to invisible facial regions and contaminated facial features, which undoubtedly increase the difficulty of FER in the wild. Therefore, a Patch Attention Convolutional Vision Transformer (PACVT) is proposed to tackle the occlusion FER problem. The backbone convolutional neural network is used to extract facial feature maps, which are cropped into multiple regional patches to extract local and global features. The Patch Attention Unit (PAU) is designed to perceive occluded regions by adaptively calculating the patch-level attention weights of local features for expression recognition. The facial patches are mapped into sequences of visual tokens, and the Vision Transformer (ViT) is employed to capture the interactions and correlations between these visual tokens from a global perspective. The self-attention in ViT enables the PACVT to focus on the salient patches with discriminative features and ignore the occlusion. Experiments are conducted on three widely used expression datasets and their occlusion subsets, and the results demonstrate that the proposed PACVT outperforms state-of-the-art methods on occlusion FER. Cross-dataset experiment results evidence the generalization ability of the PACVT. (c) 2022 Elsevier Inc. All rights reserved.

Patch attention convolutional vision transformer for facial expression recognition with occlusion

Journal

INFORMATION SCIENCES

Publisher

ELSEVIER SCIENCE INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Patch attention convolutional vision transformer for facial expression recognition with occlusion

Journal

INFORMATION SCIENCES

Publisher

ELSEVIER SCIENCE INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper