☆ 4.7 Article

Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2021)

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

卷 32, 期 10, 页码 4362-4373

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TNNLS.2020.3017530

关键词

Feature extraction; Visualization; Knowledge based systems; Task analysis; Knowledge discovery; Semantics; Cognition; Knowledge base; object detection; self-attention; visual question answering (VQA)

类别

Computer Science, Artificial Intelligence Computer Science, Hardware & Architecture Computer Science, Theory & Methods Engineering, Electrical & Electronic

资金

Fundamental Research Funds for the Central Universities [ZYGX2019J073]
National Natural Science Foundation of China [61772116, 61872064, 61632007, 61602049, 61872067]
Sichuan Science and Technology Program [2019JDTD0005, 2019YFH0016]
Open Project of Zhejiang Lab [2019KD0AB05]
Zhejiang Lab's International Talent Fund for Young Professionals

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The new framework KAN utilizes object-related knowledge and a knowledge graph to assist in the reasoning process of VQA, with an attention module that adaptively balances the importance of external knowledge against detected objects. Extensive experiments demonstrate that KAN achieves state-of-the-art performance on challenging VQA datasets and provides benefits to VQA baselines.

Visual question answering (VQA) that involves understanding an image and paired questions develops very quickly with the boost of deep learning in relevant research fields, such as natural language processing and computer vision. Existing works highly rely on the knowledge of the data set. However, some questions require more professional cues other than the data set knowledge to answer questions correctly. To address such an issue, we propose a novel framework named a knowledge-based augmentation network (KAN) for VQA. We introduce object-related open-domain knowledge to assist the question answering. Concretely, we extract more visual information from images and introduce a knowledge graph to provide the necessary common sense or experience for the reasoning process. For these two augmented inputs, we design an attention module that can adjust itself according to the specific questions, such that the importance of external knowledge against detected objects can be balanced adaptively. Extensive experiments show that our KAN achieves state-of-the-art performance on three challenging VQA data sets, i.e., VQA v2, VQA-CP v2, and FVQA. In addition, our open-domain knowledge is also beneficial to VQA baselines. Code is available at https://github.com/yyyanglz/KAN.

Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文