4.7 Article

Dynamic multi-channel metric network for joint pose-aware and identity-invariant facial expression recognition

Journal

INFORMATION SCIENCES
Volume 578, Issue -, Pages 195-213

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2021.07.034

Keywords

Multi-view facial expression recognition; Pose-aware; Identity-invariant; Multi-channel metric learning; Dynamic weight; Multi-task learning

Funding

  1. Shenzhen Fundamental Research grant [JCYJ20180508162406177, JCYJ20190813170601651]
  2. National Natural Science Foundation of China grant [62076227]
  3. Wuhan Applied Fundamental Frontier Project Grant [2020010601012166]
  4. Shenzhen Institute of Artificial Intelligence and Robotics for Society [AC01202005024, AC01202108001-04, AC01202101010]

Ask authors/readers for more resources

Facial expression recognition is challenging due to variations in head pose and inter-subject characteristics. The proposed DML-Net utilizes multiple channels to learn fused global and local features and explore identity-invariant and pose-aware expression representations, leading to improved recognition performance.
Facial expression recognition (FER) is challenging because the appearance of an expression varies significantly depending on head pose and inter-subject characteristics. With existing techniques, it is often difficult to learn both pose-aware and identity-invariant representations of facial expressions effectively due to the complex distribution of intra-class variation and similarity caused by these two factors. In this study, we propose a dynamic multi-channel metric learning network for pose-aware and identity-invariant FER, called DML-Net, which can reduce the effects of pose and identity for robust FER performance. Specifically, DML-Net uses three parallel multi-channel convolutional networks to learn fused global and local features from different facial regions. Then it uses joint embedded feature learning to explore identity-invariant and pose-aware expression representations from fused region-based features in an embedding space. DML-Net is end-to-end trainable by minimizing deep multiple metric losses, FER loss, and pose estimation loss with dynamically learned loss weights, thereby suppressing overfitting and significantly improving recognition. We evaluate DML-Net on three widely-used multi-view facial expression data sets, namely, KDEF, BU-3DFE, and Multi-PIE, as well as a wild dataset SFEW2.0. Extensive experiments demonstrate that our approach outperforms several other popular methods with accuracies of 88.2% on KDEF, 83.5% on BU-3DFE, 93.5% on Multi-PIE, and 54.36% on SFEW. (c) 2021 Elsevier Inc. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available