☆ 4.6 Article

Facial Expressions Recognition for Human-Robot Interaction Using Deep Convolutional Neural Networks with Rectified Adam Optimizer

SENSORS (2020)

Journal

SENSORS

Volume 20, Issue 8, Pages -

Publisher

MDPI

DOI: 10.3390/s20082393

Keywords

computer vision; deep learning; convolutional neural networks; advanced intelligent control; facial emotion recognition; face recognition; NAO robot

Funding

UEFISCDI Multi-MonD2 Project [PN-III-P1-1.2-PCCDI2017-0637/33PCCDI/01.03.2018]
Romanian Ministry of Research and In-novation, CCCDI-UEFISCDI [PN-III-P1-1.2-PCCDI-2017-0086/, 22 PCCDI/2018]
Yanshan University: Joint Laboratory of Intelligent Rehabilitation Robot project [KY201501009]
Yanshan University, China
Romanian Academy, IMSAR, RO
European Commission Marie Sklodowska-Curie SMOOTH project, Smart Robots for Fire-Fighting [H2020-MSCA-RISE-2016-73487]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The interaction between humans and an NAO robot using deep convolutional neural networks (CNN) is presented in this paper based on an innovative end-to-end pipeline method that applies two optimized CNNs, one for face recognition (FR) and another one for the facial expression recognition (FER) in order to obtain real-time inference speed for the entire process. Two different models for FR are considered, one known to be very accurate, but has low inference speed (faster region-based convolutional neural network), and one that is not as accurate but has high inference speed (single shot detector convolutional neural network). For emotion recognition transfer learning and fine-tuning of three CNN models (VGG, Inception V3 and ResNet) has been used. The overall results show that single shot detector convolutional neural network (SSD CNN) and faster region-based convolutional neural network (Faster R-CNN) models for face detection share almost the same accuracy: 97.8% for Faster R-CNN on PASCAL visual object classes (PASCAL VOCs) evaluation metrics and 97.42% for SSD Inception. In terms of FER, ResNet obtained the highest training accuracy (90.14%), while the visual geometry group (VGG) network had 87% accuracy and Inception V3 reached 81%. The results show improvements over 10% when using two serialized CNN, instead of using only the FER CNN, while the recent optimization model, called rectified adaptive moment optimization (RAdam), lead to a better generalization and accuracy improvement of 3%-4% on each emotion recognition CNN.

Facial Expressions Recognition for Human-Robot Interaction Using Deep Convolutional Neural Networks with Rectified Adam Optimizer

Journal

SENSORS

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Facial Expressions Recognition for Human-Robot Interaction Using Deep Convolutional Neural Networks with Rectified Adam Optimizer

Journal

SENSORS

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper