4.7 Article

Deep Head Pose: Gaze-Direction Estimation in Multimodal Video

Journal

IEEE TRANSACTIONS ON MULTIMEDIA
Volume 17, Issue 11, Pages 2094-2107

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2015.2482819

Keywords

Convolutional neural networks (CNNs); deep learning; gaze direction; head-pose; RGB-D

Funding

  1. Engineering and Physical Sciences Research Council (EPSRC) [EP/K014277/1]
  2. MOD University Defence Research Collaboration in Signal Processing
  3. EPSRC [EP/K014277/1] Funding Source: UKRI
  4. Engineering and Physical Sciences Research Council [EP/K014277/1] Funding Source: researchfish

Ask authors/readers for more resources

In this paper we present a convolutional neural network (CNN)-based model for human head pose estimation in low-resolution multi-modal RGB-D data. We pose the problem as one of classification of human gazing direction. We further fine-tune a regressor based on the learned deep classifier. Next we combine the two models (classification and regression) to estimate approximate regression confidence. We present state-of-the-art results in datasets that span the range of high-resolution human robot interaction (close up faces plus depth information) data to challenging low resolution outdoor surveillance data. We build upon our robust head-pose estimation and further introduce a new visual attention model to recover interaction with the environment. Using this probabilistic model, we show that many higher level scene understanding like human-human/scene interaction detection can be achieved. Our solution runs in real-time on commercial hardware.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available