4.8 Article

Bilateral Relation Distillation for Weakly Supervised Temporal Action Localization

Journal

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TPAMI.2023.3284853

Keywords

Bilateral relation distillation; weakly supervised learning; temporal action localization

Ask authors/readers for more resources

This paper proposes a method called Bilateral Relation Distillation (BRD) to address the problem of weakly supervised temporal action localization. The method learns representations by jointly modeling category-level and sequence-level relations, and captures category-level relations through correlation alignment and category-aware contrast. Additionally, it utilizes a gradient-based feature augmentation method to model relations among segments at the sequence-level.
Weakly supervised temporal action localization (WSTAL), which aims to locate the time interval of actions in an untrimmed video with only video-level action labels, has attracted increasing research interest in the past few years. However, a model trained with such labels will tend to focus on segments that contributions most to the video-level classification, leading to inaccurate and incomplete localization results. In this paper, we tackle the problem from a novel perspective of relation modeling and propose a method dubbed Bilateral Relation Distillation (BRD). The core of our method involves learning representations by jointly modeling the relation at the category and sequence levels. Specifically, category-wise latent segment representations are first obtained by different embedding networks, one for each category. We then distill knowledge obtained from a pre-trained language model to capture the category-level relations, which is achieved by performing correlation alignment and category-aware contrast in an intra- and inter-video manner. To model the relations among segments at the sequence-level, we elaborate a gradient-based feature augmentation method and encourage the learned latent representation of the augmented feature to be consistent with that of the original one. Extensive experiments illustrate that our approach achieves state-of-the-art results on THUMOS14 and ActivityNet1.3 datasets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available