☆ 4.5 Article

Pose-Guided Inflated 3D ConvNet for action recognition in videos

SIGNAL PROCESSING-IMAGE COMMUNICATION (2021)

Journal

SIGNAL PROCESSING-IMAGE COMMUNICATION

Volume 91, Issue -, Pages -

Publisher

ELSEVIER

DOI: 10.1016/j.image.2020.116098

Keywords

Action recognition; Pose estimation; Spatial-temporal information; Feature fusion

Funding

National Natural Science Foundation of China [61503017]
China Postdoctoral Science Foundation [2019M661999]
Natural Science Foundation of the Jiangsu Higher Education Institutions of China [19KJB520009]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

A novel Pose-Guided Inflated 3D ConvNet framework (PI3D) is proposed in this paper, which outperforms existing methods on human action recognition. The framework utilizes a pose module and a hierarchical pose-based network to improve performance.

Human action recognition in videos is still an important while challenging task. Existing methods based on RGB image or optical flow are easily affected by clutters and ambiguous backgrounds. In this paper, we propose a novel Pose-Guided Inflated 3D ConvNet framework (PI3D) to address this issue. First, we design a spatial-temporal pose module, which provides essential clues for the Inflated 3D ConvNet (I3D). The pose module consists of pose estimation and pose-based action recognition. Second, for multi-person estimation task, the introduced pose estimation network can determine the action most relevant to the action category. Third, we propose a hierarchical pose-based network to learn the spatial-temporal features of human pose. Moreover, the pose-based network and I3D network are fused at the last convolutional layer without loss of performance. Finally, the experimental results on four data sets (HMDB-51, SYSU 3D, JHMDB and Sub-JHMDB) demonstrate that the proposed PI3D framework outperforms the existing methods on human action recognition. This work also shows that posture cues significantly improve the performance of I3D.

Pose-Guided Inflated 3D ConvNet for action recognition in videos

Journal

SIGNAL PROCESSING-IMAGE COMMUNICATION

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Pose-Guided Inflated 3D ConvNet for action recognition in videos

Journal

SIGNAL PROCESSING-IMAGE COMMUNICATION

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper