4.7 Article

CI-GNN: Building a Category-Instance Graph for Zero-Shot Video Classification

期刊

IEEE TRANSACTIONS ON MULTIMEDIA
卷 22, 期 12, 页码 3088-3100

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2020.2969787

关键词

Semantics; Task analysis; Visualization; Training; Message passing; Pattern recognition; Neural networks; Zero-shot video classification; graph neural network; zero-shot learning

资金

  1. National Natural Science Foundation of China [61720106006, 61721004, 61832002, 61532009, U1705262, U1836220, 61702511]
  2. Key Research Program of Frontier Sciences, CAS [QYZDJSSWJSC039]
  3. Research Program of National Laboratory of Pattern Recognition [Z-2018007]

向作者/读者索取更多资源

With the ever-growing video categories, Zero-Shot Learning (ZSL) in video classification has drawn considerable attention in recent years. To transfer the learned knowledge from seen categories to unseen categories, most existing methods resort to an implicit model that learns a projection between visual features and semantic category-representations. However, such methods ignore the explicit relationships among video instances and categories, which impede the direct information propagation in a Category-Instance graph (CI-graph) consisting of both instances and categories. In fact, exploring the structure of the CI-graph can capture the invariances of the ZSL task with good generality for unseen instances. Inspired by these observations, we propose an end-to-end framework to directly and collectively model the relationships between category-instance, category-category, and instance-instance in the CI-graph. Specifically, to construct node features of this graph, we adopt object semantics as a bridge to generate unified representations for both videos and categories. Motivated by the favorable performance of Graph Neural Networks (GNNs), we design a Category-Instance GNN (CI-GNN) to adaptively model the structure of the CI-graph and propagate information among categories and videos. With the task-driven message passing process, the learned model is able to transfer label information from categories towards unseen videos. Extensive experiments on four video datasets demonstrate the favorable performance of the proposed framework.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据