4.6 Article

In search of a robust facial expressions recognition model: A large-scale visual cross-corpus study

期刊

NEUROCOMPUTING
卷 514, 期 -, 页码 435-450

出版社

ELSEVIER
DOI: 10.1016/j.neucom.2022.10.013

关键词

Visual emotion recognition; Affective computing; Paralinguistic analysis; Cross -corpus analysis; Deep learning; End-to-end model

资金

  1. Analytical Center for the Government of the Russian Federation
  2. [IGK 000000D730321P5Q0002]
  3. [70-2021-00141]

向作者/读者索取更多资源

Researchers have been searching for a robust emotion recognition system for the past two decades. However, one of the key problems is the lack of generalization ability. This study introduces a visual cross-corpus investigation and proposes a visual-based emotion recognition framework that demonstrates high generalization ability. The results show superior performance compared to state-of-the-art models.
Many researchers have been seeking robust emotion recognition system for already last two decades. It would advance computer systems to a new level of interaction, providing much more natural feedback during human-computer interaction due to analysis of user affect state. However, one of the key prob-lems in this domain is a lack of generalization ability: we observe dramatic degradation of model perfor-mance when it was trained on one corpus and evaluated on another one. Although some studies were done in this direction, visual modality still remains under-investigated. Therefore, we introduce the visual cross-corpus study conducted with the utilization of eight corpora, which differ in recording con-ditions, participants' appearance characteristics, and complexity of data processing. We propose a visual -based end-to-end emotion recognition framework, which consists of the robust pre-trained backbone model and temporal sub-system in order to model temporal dependencies across many video frames. In addition, a detailed analysis of mistakes and advantages of the backbone model is provided, demon-strating its high ability of generalization. Our results show that the backbone model has achieved the accuracy of 66.4% on the AffectNet dataset, outperforming all the state-of-the-art results. Moreover, the CNN-LSTM model has demonstrated a decent efficacy on dynamic visual datasets during cross -corpus experiments, achieving comparable with state-of-the-art results. In addition, we provide back-bone and CNN-LSTM models for future researchers: they can be accessed via GitHub.(c) 2022 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据