☆ 3.8 Proceedings Paper

HAIR: Hierarchical Visual-Semantic Relational Reasoning for Video Question Answering

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) (2021)

期刊

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021)

卷 -, 期 -, 页码 1678-1687

出版社

IEEE

DOI: 10.1109/ICCV48922.2021.00172

关键词

类别

Computer Science, Artificial Intelligence Computer Science, Theory & Methods

资金

National Key Research and Development Program of China [2020AAA0106400]
National Natural Science Foundation of China [61922086, 61872366]
Beijing Natural Science Foundation [4192059, JQ20022]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The paper introduces a Hierarchical VisuAl-Semantic RelatIonal Reasoning (HAIR) framework for video question answering, which integrates visual and semantic knowledge through graph memory mechanisms. Experimental results demonstrate state-of-the-art performance, fewer parameters, and faster inference speed, as well as superior performance in other video+language tasks.

Relational reasoning is at the heart of video question answering. However, existing approaches suffer from several common limitations: (1) they only focus on either object-level or frame-level relational reasoning, and fail to integrate the both; and (2) they neglect to leverage semantic knowledge for relational reasoning. In this work, we propose a Hierarchical VisuAl-Semantic RelatIonal Reasoning (HAIR) framework to address these limitations. Specifically, we present a novel graph memory mechanism to perform relational reasoning, and further develop two types of graph memory: a) visual graph memory that leverages visual information of video for relational reasoning; b) semantic graph memory that is specifically designed to explicitly leverage semantic knowledge contained in the classes and attributes of video objects, and perform relational reasoning in the semantic space. Taking advantage of both graph memory mechanisms, we build a hierarchical framework to enable visual-semantic relational reasoning from object level to frame level. Experiments on four challenging benchmark datasets show that the proposed framework leads to state-of-the-art performance, with fewer parameters and faster inference speed. Besides, our approach also shows superior performance on other video+language task.

HAIR: Hierarchical Visual-Semantic Relational Reasoning for Video Question Answering

期刊

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021)

出版社

IEEE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

HAIR: Hierarchical Visual-Semantic Relational Reasoning for Video Question Answering

期刊

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021)

出版社

IEEE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文