4.8 Article

Bilateral Relation Distillation for Weakly Supervised Temporal Action Localization

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TPAMI.2023.3284853

关键词

Bilateral relation distillation; weakly supervised learning; temporal action localization

向作者/读者索取更多资源

This paper proposes a method called Bilateral Relation Distillation (BRD) to address the problem of weakly supervised temporal action localization. The method learns representations by jointly modeling category-level and sequence-level relations, and captures category-level relations through correlation alignment and category-aware contrast. Additionally, it utilizes a gradient-based feature augmentation method to model relations among segments at the sequence-level.
Weakly supervised temporal action localization (WSTAL), which aims to locate the time interval of actions in an untrimmed video with only video-level action labels, has attracted increasing research interest in the past few years. However, a model trained with such labels will tend to focus on segments that contributions most to the video-level classification, leading to inaccurate and incomplete localization results. In this paper, we tackle the problem from a novel perspective of relation modeling and propose a method dubbed Bilateral Relation Distillation (BRD). The core of our method involves learning representations by jointly modeling the relation at the category and sequence levels. Specifically, category-wise latent segment representations are first obtained by different embedding networks, one for each category. We then distill knowledge obtained from a pre-trained language model to capture the category-level relations, which is achieved by performing correlation alignment and category-aware contrast in an intra- and inter-video manner. To model the relations among segments at the sequence-level, we elaborate a gradient-based feature augmentation method and encourage the learned latent representation of the augmented feature to be consistent with that of the original one. Extensive experiments illustrate that our approach achieves state-of-the-art results on THUMOS14 and ActivityNet1.3 datasets.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据