4.7 Article

SAM: Modeling Scene, Object and Action With Semantics Attention Modules for Video Recognition

期刊

IEEE TRANSACTIONS ON MULTIMEDIA
卷 24, 期 -, 页码 313-322

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2021.3050058

关键词

Video recognition; scene; object; feature fusion; semantics attention

资金

  1. National Key R&D Program of China [2018YFB1004300]

向作者/读者索取更多资源

Video recognition aims to understand the semantic contents involving interactions between humans and related objects in specific scenes. The fusion of object, scene, and action features is commonly used to improve recognition accuracy. In this paper, the authors propose a method that breaks down the fusion of three features into two pairwise feature relation modeling processes, which helps overcome the challenge of correlation learning in high dimensional features. The proposed method achieves better results with less computational effort compared to alternative methods.
Video recognition aims at understanding semantic contents that normally involve the interactions of humans and related objects under certain scenes. A common practice to improve recognition accuracy is to combine object, scene and action features for classification directly, assuming that they are explicitly complementary. In this paper, we break down the fusion of three features into two pairwise feature relation modeling processes, which mitigates the difficulty of correlation learning in high dimensional features. Towards this goal, we introduce a Semantics Attention Module that captures the relations of a pair of features by refining the relatively weak feature with the guidance from the strong feature using attention mechanisms. The refined representation is further combined with the strong feature using a residual design for downstream tasks. Two SAMs are applied in a Semantics Attention Network (SAN) for improving video recognition. Extensive experiments are conducted on two large-scale video benchmarks, FCVID and ActivityNet v1.3-the proposed approach achieves better results while requiring much less computational effort than alternative methods.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据