4.7 Article

Action recognition for depth video using multi-view dynamic images

Journal

INFORMATION SCIENCES
Volume 480, Issue -, Pages 287-304

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2018.12.050

Keywords

Action recognition; Depth video; Multi-view dynamic image; Convolutional neural network; Action proposal

Funding

  1. Natural Science Foundation of China [61502187, 61876211, 61602193, 61702182]
  2. National Key R&D Program of China [2018YFB1004600]
  3. International Science and Technology Cooperation Program of Hubei Province, China [2017AHB051]
  4. HUST Interdisciplinary Innovation Team Foundation [2016JCTD120]
  5. Hunan Provincial Natural Science Foundation of China [2018JJ3254]
  6. Singapore government's Research, Innovation and Enterprise 2020 plan (Advanced Manufacturing and Engineering domain) [A1687b0033]

Ask authors/readers for more resources

Dynamic imaging is a recently proposed action description paradigm for simultaneously capturing motion and temporal evolution information, particularly in the context of deep convolutional neural networks (CNNs). Compared with optical flow for motion characterization, dynamic imaging exhibits superior efficiency and compactness. Inspired by the success of dynamic imaging in RGB video, this study extends it to the depth domain. To better exploit three-dimensional (3D) characteristics, multi-view dynamic images are proposed. In particular, the raw depth video is densely projected with respect to different virtual imaging viewpoints by rotating the virtual camera within the 3D space. Subsequently, dynamic images are extracted from the obtained multi-view depth videos and multi-view dynamic images are thus constructed from these images. Accordingly, more view-tolerant visual cues can be involved. A novel CNN model is then proposed to perform feature learning on multi-view dynamic images. Particularly, the dynamic images from different views share the same convolutional layers but correspond to different fully connected layers. This is aimed at enhancing the tuning effectiveness on shallow convolutional layers by alleviating the gradient vanishing problem. Moreover, as the spatial occurrence variation of the actions may impair the CNN, an action proposal approach is also put forth. In experiments, the proposed approach can achieve state-of-the-art performance on three challenging datasets. (C) 2018 Elsevier Inc. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available