4.7 Article

Multi-Scale Structure-Aware Network for Weakly Supervised Temporal Action Detection

Journal

IEEE TRANSACTIONS ON IMAGE PROCESSING
Volume 30, Issue -, Pages 5848-5861

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIP.2021.3089361

Keywords

Proposals; Feature extraction; Image segmentation; Scalability; Noise measurement; Graph neural networks; GSM; Weakly supervised; action detection; multi-scale; structure-aware

Funding

  1. National Nature Science Foundation of China [62022078, 62021001]
  2. Open Project Program of the National Laboratory of Pattern Recognition (NLPR) [202000019]
  3. Youth Innovation Promotion Association CAS [2018166]

Ask authors/readers for more resources

This paper proposed an end-to-end Multi-Scale Structure-Aware Network (MSA-Net) for weakly supervised temporal action detection, which explores both the global and local structure information to effectively learn discriminative structure aware representations for robust and complete action detection. Extensive experimental results on two benchmark datasets demonstrate that MSA-Net outperforms state-of-the-art methods.
Weakly supervised temporal action detection has better scalability and practicability than fully supervised action detection in reality deployment. However, it is difficult to learn a robust model without temporal action boundary annotations. In this paper, we propose an en-to-end Multi-Scale Structure-Aware Network (MSA-Net) for weakly supervised temporal action detection by exploring both the global structure information of a video and the local structure information of actions. The proposed SA-Net enjoys several merits. First, to localize actions with different durations, each video is encoded into feature representations with different temporal scales. Second, based on the multi-scale feature representation, the proposed model has designed two effective structure modeling mechanisms including global structure modeling and local structure modeling, which can effectively learn discriminative structure aware representations for robust and complete action detection. To the best of our knowledge, this is the first work to fully explore the global and local structure information in a unified deep model for weakly supervised action detection. And extensive experimental results on two benchmark datasets demonstrate that the proposed MSA-Net performs favorably against state-of-the-art methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available