4.6 Article

Fast and Accurate Facial Expression Image Classification and Regression Method Based on Knowledge Distillation

Journal

APPLIED SCIENCES-BASEL
Volume 13, Issue 11, Pages -

Publisher

MDPI
DOI: 10.3390/app13116409

Keywords

facial expression classification; facial expression regression; arousal; valence; knowledge distillation

Ask authors/readers for more resources

In order to create a facial expression recognition system for practical applications, this study proposes a method that maximizes the synergistic effect of jointly learning discrete and continuous emotional states using a knowledge distillation structure and teacher-bounded loss function. By using Emonet as the teacher model and a lightweight network as the student model, it is shown that performance degradation can be minimized, and computational efficiency can be significantly improved. The proposed method is optimized for application-level interaction systems in terms of both computational requirements and accuracy.
As emotional states are diverse, simply classifying them through discrete facial expressions has its limitations. Therefore, to create a facial expression recognition system for practical applications, not only must facial expressions be classified, emotional changes must be measured as continuous values. Based on the knowledge distillation structure and the teacher-bounded loss function, we propose a method to maximize the synergistic effect of jointly learning discrete and continuous emotional states of eight expression classes, valences, and arousal levels. The proposed knowledge distillation model uses Emonet, a state-of-the-art continuous estimation method, as the teacher model, and uses a lightweight network as the student model. It was confirmed that performance degradation can be minimized even though student models have multiply-accumulate operations of approximately 3.9 G and 0.3 G when using EfficientFormer and MobileNetV2, respectively, which is much less than the amount of computation required by the teacher model (16.99 G). Together with the significant improvements in computational efficiency (by 4.35 and 56.63 times using EfficientFormer and MobileNetV2, respectively), the decreases in facial expression classification accuracy were approximately 1.35% and 1.64%, respectively. Therefore, the proposed method is optimized for application-level interaction systems in terms of both the amount of computation required and the accuracy.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available