3.8 Proceedings Paper

DECOUPLING LOCALIZATION AND CLASSIFICATION IN SINGLE SHOT TEMPORAL ACTION DETECTION

Publisher

IEEE
DOI: 10.1109/ICME.2019.00224

Keywords

Video Analysis; Temporal Action Detection; Action Proposal; Action Recognition

Funding

  1. National Key R&D Program of China [2018YFB0203904]
  2. National Natural Science Foundation of China [U1611261]
  3. Program for Guangdong Introducing Innovative and Enterpreneurial Teams [2016ZT06D211]

Ask authors/readers for more resources

Video temporal action detection aims to temporally localize and recognize the action in untrimmed videos. Existing one-stage approaches mostly focus on unifying two subtasks, i.e., localization of action proposals and classification of each proposal through a fully shared backbone. However, such design of encapsulating all components of two subtasks in one single network might restrict the training by ignoring the specialized characteristic of each subtask. In this paper, we propose a novel Decoupled Single Shot temporal Action Detection (Decouple-SSAD) method to mitigate such problem by decoupling the localization and classification in a one-stage scheme. Particularly, two separate branches are designed in parallel to enable each component to own representations privately for accurate localization or classification. Each branch produces a set of action anchor layers by applying deconvolution to the feature maps of the main stream. High-level semantic information from deeper layers is thus incorporated to enhance the feature representations. We conduct extensive experiments on THUMOS14 dataset and demonstrate superior performance over state-of-the-art methods. Our code is available online(1).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available