4.7 Article

STSM: Spatio-Temporal Shift Module for Efficient Action Recognition

期刊

MATHEMATICS
卷 10, 期 18, 页码 -

出版社

MDPI
DOI: 10.3390/math10183290

关键词

spatio-temporal features; shift operation; action recognition; 2D convolution

资金

  1. National Natural Science Foundation of China [62072028, 61772067]

向作者/读者索取更多资源

The paper focuses on the modeling, computational complexity, and accuracy of spatio-temporal models in video action recognition. A plug-and-play Spatio-Temporal Shift Module (STSM) is proposed, which effectively enhances the network's ability to learn spatio-temporal features without increasing parameters and computational complexity. By integrating with 2D CNNs, the new network can learn spatio-temporal features and outperform networks based on 3D convolutions.
The modeling, computational complexity, and accuracy of spatio-temporal models are the three major foci in the field of video action recognition. The traditional 2D convolution has low computational complexity, but it cannot capture the temporal relationships. Although the 3D convolution can obtain good performance, it is with both high computational complexity and a large number of parameters. In this paper, we propose a plug-and-play Spatio-Temporal Shift Module (STSM), which is a both effective and high-performance module. STSM can be easily inserted into other networks to increase or enhance the ability of the network to learn spatio-temporal features, effectively improving performance without increasing the number of parameters and computational complexity. In particular, when 2D CNNs and STSM are integrated, the new network may learn spatio-temporal features and outperform networks based on 3D convolutions. We revisit the shift operation from the perspective of matrix algebra, i.e., the spatio-temporal shift operation is a convolution operation with a sparse convolution kernel. Furthermore, we extensively evaluate the proposed module on Kinetics-400 and Something-Something V2 datasets. The experimental results show the effectiveness of the proposed STSM, and the proposed action recognition networks may also achieve state-of-the-art results on the two action recognition benchmarks.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据