期刊
IEEE TRANSACTIONS ON MULTIMEDIA
卷 22, 期 12, 页码 3088-3100出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2020.2969787
关键词
Semantics; Task analysis; Visualization; Training; Message passing; Pattern recognition; Neural networks; Zero-shot video classification; graph neural network; zero-shot learning
资金
- National Natural Science Foundation of China [61720106006, 61721004, 61832002, 61532009, U1705262, U1836220, 61702511]
- Key Research Program of Frontier Sciences, CAS [QYZDJSSWJSC039]
- Research Program of National Laboratory of Pattern Recognition [Z-2018007]
With the ever-growing video categories, Zero-Shot Learning (ZSL) in video classification has drawn considerable attention in recent years. To transfer the learned knowledge from seen categories to unseen categories, most existing methods resort to an implicit model that learns a projection between visual features and semantic category-representations. However, such methods ignore the explicit relationships among video instances and categories, which impede the direct information propagation in a Category-Instance graph (CI-graph) consisting of both instances and categories. In fact, exploring the structure of the CI-graph can capture the invariances of the ZSL task with good generality for unseen instances. Inspired by these observations, we propose an end-to-end framework to directly and collectively model the relationships between category-instance, category-category, and instance-instance in the CI-graph. Specifically, to construct node features of this graph, we adopt object semantics as a bridge to generate unified representations for both videos and categories. Motivated by the favorable performance of Graph Neural Networks (GNNs), we design a Category-Instance GNN (CI-GNN) to adaptively model the structure of the CI-graph and propagate information among categories and videos. With the task-driven message passing process, the learned model is able to transfer label information from categories towards unseen videos. Extensive experiments on four video datasets demonstrate the favorable performance of the proposed framework.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据