☆ 4.7 Article

STSM: Spatio-Temporal Shift Module for Efficient Action Recognition

MATHEMATICS (2022)

期刊

MATHEMATICS

卷 10, 期 18, 页码 -

出版社

MDPI

DOI: 10.3390/math10183290

关键词

spatio-temporal features; shift operation; action recognition; 2D convolution

类别

Mathematics

资金

National Natural Science Foundation of China [62072028, 61772067]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The paper focuses on the modeling, computational complexity, and accuracy of spatio-temporal models in video action recognition. A plug-and-play Spatio-Temporal Shift Module (STSM) is proposed, which effectively enhances the network's ability to learn spatio-temporal features without increasing parameters and computational complexity. By integrating with 2D CNNs, the new network can learn spatio-temporal features and outperform networks based on 3D convolutions.

The modeling, computational complexity, and accuracy of spatio-temporal models are the three major foci in the field of video action recognition. The traditional 2D convolution has low computational complexity, but it cannot capture the temporal relationships. Although the 3D convolution can obtain good performance, it is with both high computational complexity and a large number of parameters. In this paper, we propose a plug-and-play Spatio-Temporal Shift Module (STSM), which is a both effective and high-performance module. STSM can be easily inserted into other networks to increase or enhance the ability of the network to learn spatio-temporal features, effectively improving performance without increasing the number of parameters and computational complexity. In particular, when 2D CNNs and STSM are integrated, the new network may learn spatio-temporal features and outperform networks based on 3D convolutions. We revisit the shift operation from the perspective of matrix algebra, i.e., the spatio-temporal shift operation is a convolution operation with a sparse convolution kernel. Furthermore, we extensively evaluate the proposed module on Kinetics-400 and Something-Something V2 datasets. The experimental results show the effectiveness of the proposed STSM, and the proposed action recognition networks may also achieve state-of-the-art results on the two action recognition benchmarks.

STSM: Spatio-Temporal Shift Module for Efficient Action Recognition

期刊

MATHEMATICS

出版社

MDPI

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

STSM: Spatio-Temporal Shift Module for Efficient Action Recognition

期刊

MATHEMATICS

出版社

MDPI

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文