☆ 4.8 Article

Graph Convolutional Module for Temporal Action Localization in Videos

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2022)

期刊

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

卷 44, 期 10, 页码 6209-6223

出版社

IEEE COMPUTER SOC

DOI: 10.1109/TPAMI.2021.3090167

关键词

Temporal action localization; graph convolutional networks; video analysis

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic

资金

Scientific Innovation 2030 Major Project for New Generation of AI [2020AAA0107300]
Ministry of Science and Technology of the People's Republic of China
National Natural Science Foundation of China (NSFC) [62072190]
Key-Area Research and Development Program of Guangdong Province [2018B010107001]
Program for Guangdong Introducing Innovative and Enterpreneurial Teams [2017ZT07X183]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes a general graph convolutional module (GCM) to enhance the performance of temporal action localization methods. The relationships between action units are found to be important in action localization, and GCM is able to capture both the local content and the context of action units effectively.

Temporal action localization, which requires a machine to recognize the location as well as the category of action instances in videos, has long been researched in computer vision. The main challenge of temporal action localization lies in that videos are usually long and untrimmed with diverse action contents involved. Existing state-of-the-art action localization methods divide each video into multiple action units (i.e., proposals in two-stage methods and segments in one-stage methods) and then perform action recognition/regression on each of them individually, without explicitly exploiting their relations during learning. In this paper, we claim that the relations between action units play an important role in action localization, and a more powerful action detector should not only capture the local content of each action unit but also allow a wider field of view on the context related to it. To this end, we propose a general graph convolutional module (GCM) that can be easily plugged into existing action localization methods, including two-stage and one-stage paradigms. To be specific, we first construct a graph, where each action unit is represented as a node and their relations between two action units as an edge. Here, we use two types of relations, one for capturing the temporal connections between different action units, and the other one for characterizing their semantic relationship. Particularly for the temporal connections in two-stage methods, we further explore two different kinds of edges, one connecting the overlapping action units and the other one connecting surrounding but disjointed units. Upon the graph we built, we then apply graph convolutional networks (GCNs) to model the relations among different action units, which is able to learn more informative representations to enhance action localization. Experimental results show that our GCM consistently improves the performance of existing action localization methods, including two-stage methods (e.g., CBR [15] and R-C3D [47]) and one-stage methods (e.g., D-SSAD [22]), verifying the generality and effectiveness of our GCM. Moreover, with the aid of GCM, our approach significantly outperforms the state-of-the-art on THUMOS14 (50.9 percent versus 42.8 percent). Augmentation experiments on ActivityNet also verify the efficacy of modeling the relationships between action units. The source code and the pre-trained models are available at https://github.com/Alvin-Zeng/GCM.

Graph Convolutional Module for Temporal Action Localization in Videos

期刊

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Graph Convolutional Module for Temporal Action Localization in Videos

期刊

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文