4.7 Article

Multi-level alignment for few-shot temporal action localization

期刊

INFORMATION SCIENCES
卷 650, 期 -, 页码 -

出版社

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2023.119618

关键词

Few-shot learning; Temporal action localization; Feature alignment; Cosine similarity

向作者/读者索取更多资源

This study proposes a new few-shot learning method for temporal action localization in long videos. The method utilizes a multi-level encoder cosine-similarity alignment module to capture the alignment of visual information, and incorporates cosine similarity in Transformer encoder layers to emphasize refined features. By adopting an episodic-based training scheme, it learns the alignment of similar video snippets and adapts to novel classes at test time.
Temporal action localization (TAL), which aims to localize actions in long untrimmed videos, requires a large number of annotated training data. However, it is expensive to obtain segment level annotations for large-scale datasets. To overcome this challenge, a new few-shot learning method is proposed that localizes temporal actions for unseen classes with only a few training samples. In this study, a new multi-level encoder cosine-similarity alignment module is adopted that exploits the alignment of visual information at each temporal location. The proposed method arranges the video snippets that contain similar foreground action instances, and it captures the intra-class variations more implicitly. In addition, it incorporates cosine similarity in Transformer encoder layers that supports the self-attention mechanism. This emphasizes more on refined features at the higher encoder layers. Towards this objective, an episodic-based training scheme is adopted to learn the alignment of similar video snippets with a few training examples. At the test time, the learned context information is then adapted to novel classes. Experimental results show that the proposed method outperforms the state-of-the-art methods for few-shot temporal action localization with single and multiple action instances on the ActivityNet-1.3 dataset and achieves competitive results on the THUMOS-14 and HACS datasets.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据