4.7 Article

Holographic Feature Learning of Egocentric-Exocentric Videos for Multi-Domain Action Recognition

期刊

IEEE TRANSACTIONS ON MULTIMEDIA
卷 24, 期 -, 页码 2273-2286

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2021.3078882

关键词

Videos; Feature extraction; Visualization; Task analysis; Computational modeling; Target recognition; Prototypes; Egocentric videos; exocentric videos; holographic feature; multi-domain; action recognition

资金

  1. National Key Research and Development Program of China [2018AAA0100604]
  2. National Natural Science Foundation of China [61720106006, 61721004, 62072455, U1836220, U1705262, 61872424]
  3. Key Research Program of Frontier Sciences of CAS [QYZDJ-SSW-JSC039]
  4. Beijing Natural Science Foundation [L201001]

向作者/读者索取更多资源

This paper proposes a method to solve the multi-domain action recognition task of egocentric-exocentric videos by transferring knowledge between the two domains to learn a single model. It maps videos to a global feature space and combines view-invariant and view-specific visual knowledge.
Though existing cross-domain action recognition methods successfully improve the performance on videos of one view (e.g., egocentric videos) by transferring the knowledge from videos of another view (e.g., exocentric videos), they have limitations in generality because the source and target domains need to be fixed aforehand. In this paper, we propose to solve a more practical task of multi-domain action recognition on egocentric-exocentric videos, which aims to learn a single model to recognize test videos from either egocentric perspective or exocentric perspective by transferring knowledge between two domains. Though previous cross-domain methods can also transfer knowledge from one domain to another one by learning view-invariant representations of two video domains, they are not suitable for the multi-domain action recognition task because they always suffer from the problem of losing view-specific visual information. As a solution to the multi-domain action recognition task, we propose to map a video from either egocentric perspective or exocentric perspective to a global feature space (we call it holographic feature space) that shares both view-invariant and view-specific visual knowledge of two views. Specially, we decompose the video feature into view-invariant component and view-specific component, where view-specific component is written into memory networks for saving view-specific visual knowledge. The final holographic feature combines view-invariant feature and view-specific features of two views based on the memory networks. We demonstrate the effectiveness of the proposed method with extensive experimental results on two public datasets. Moreover, the good performances under the semi-supervised setting show the generality of our model.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据