☆ 4.8 Article

Bilateral Relation Distillation for Weakly Supervised Temporal Action Localization

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

期刊

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

卷 45, 期 10, 页码 11458-11471

出版社

IEEE COMPUTER SOC

DOI: 10.1109/TPAMI.2023.3284853

关键词

Bilateral relation distillation; weakly supervised learning; temporal action localization

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes a method called Bilateral Relation Distillation (BRD) to address the problem of weakly supervised temporal action localization. The method learns representations by jointly modeling category-level and sequence-level relations, and captures category-level relations through correlation alignment and category-aware contrast. Additionally, it utilizes a gradient-based feature augmentation method to model relations among segments at the sequence-level.

Weakly supervised temporal action localization (WSTAL), which aims to locate the time interval of actions in an untrimmed video with only video-level action labels, has attracted increasing research interest in the past few years. However, a model trained with such labels will tend to focus on segments that contributions most to the video-level classification, leading to inaccurate and incomplete localization results. In this paper, we tackle the problem from a novel perspective of relation modeling and propose a method dubbed Bilateral Relation Distillation (BRD). The core of our method involves learning representations by jointly modeling the relation at the category and sequence levels. Specifically, category-wise latent segment representations are first obtained by different embedding networks, one for each category. We then distill knowledge obtained from a pre-trained language model to capture the category-level relations, which is achieved by performing correlation alignment and category-aware contrast in an intra- and inter-video manner. To model the relations among segments at the sequence-level, we elaborate a gradient-based feature augmentation method and encourage the learned latent representation of the augmented feature to be consistent with that of the original one. Extensive experiments illustrate that our approach achieves state-of-the-art results on THUMOS14 and ActivityNet1.3 datasets.

Bilateral Relation Distillation for Weakly Supervised Temporal Action Localization

期刊

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

出版社

IEEE COMPUTER SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Bilateral Relation Distillation for Weakly Supervised Temporal Action Localization

期刊

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

出版社

IEEE COMPUTER SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文