4.6 Article

Fast and Accurate Facial Expression Image Classification and Regression Method Based on Knowledge Distillation

期刊

APPLIED SCIENCES-BASEL
卷 13, 期 11, 页码 -

出版社

MDPI
DOI: 10.3390/app13116409

关键词

facial expression classification; facial expression regression; arousal; valence; knowledge distillation

向作者/读者索取更多资源

In order to create a facial expression recognition system for practical applications, this study proposes a method that maximizes the synergistic effect of jointly learning discrete and continuous emotional states using a knowledge distillation structure and teacher-bounded loss function. By using Emonet as the teacher model and a lightweight network as the student model, it is shown that performance degradation can be minimized, and computational efficiency can be significantly improved. The proposed method is optimized for application-level interaction systems in terms of both computational requirements and accuracy.
As emotional states are diverse, simply classifying them through discrete facial expressions has its limitations. Therefore, to create a facial expression recognition system for practical applications, not only must facial expressions be classified, emotional changes must be measured as continuous values. Based on the knowledge distillation structure and the teacher-bounded loss function, we propose a method to maximize the synergistic effect of jointly learning discrete and continuous emotional states of eight expression classes, valences, and arousal levels. The proposed knowledge distillation model uses Emonet, a state-of-the-art continuous estimation method, as the teacher model, and uses a lightweight network as the student model. It was confirmed that performance degradation can be minimized even though student models have multiply-accumulate operations of approximately 3.9 G and 0.3 G when using EfficientFormer and MobileNetV2, respectively, which is much less than the amount of computation required by the teacher model (16.99 G). Together with the significant improvements in computational efficiency (by 4.35 and 56.63 times using EfficientFormer and MobileNetV2, respectively), the decreases in facial expression classification accuracy were approximately 1.35% and 1.64%, respectively. Therefore, the proposed method is optimized for application-level interaction systems in terms of both the amount of computation required and the accuracy.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据