4.7 Article

Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering

期刊

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNNLS.2020.3017530

关键词

Feature extraction; Visualization; Knowledge based systems; Task analysis; Knowledge discovery; Semantics; Cognition; Knowledge base; object detection; self-attention; visual question answering (VQA)

资金

  1. Fundamental Research Funds for the Central Universities [ZYGX2019J073]
  2. National Natural Science Foundation of China [61772116, 61872064, 61632007, 61602049, 61872067]
  3. Sichuan Science and Technology Program [2019JDTD0005, 2019YFH0016]
  4. Open Project of Zhejiang Lab [2019KD0AB05]
  5. Zhejiang Lab's International Talent Fund for Young Professionals

向作者/读者索取更多资源

The new framework KAN utilizes object-related knowledge and a knowledge graph to assist in the reasoning process of VQA, with an attention module that adaptively balances the importance of external knowledge against detected objects. Extensive experiments demonstrate that KAN achieves state-of-the-art performance on challenging VQA datasets and provides benefits to VQA baselines.
Visual question answering (VQA) that involves understanding an image and paired questions develops very quickly with the boost of deep learning in relevant research fields, such as natural language processing and computer vision. Existing works highly rely on the knowledge of the data set. However, some questions require more professional cues other than the data set knowledge to answer questions correctly. To address such an issue, we propose a novel framework named a knowledge-based augmentation network (KAN) for VQA. We introduce object-related open-domain knowledge to assist the question answering. Concretely, we extract more visual information from images and introduce a knowledge graph to provide the necessary common sense or experience for the reasoning process. For these two augmented inputs, we design an attention module that can adjust itself according to the specific questions, such that the importance of external knowledge against detected objects can be balanced adaptively. Extensive experiments show that our KAN achieves state-of-the-art performance on three challenging VQA data sets, i.e., VQA v2, VQA-CP v2, and FVQA. In addition, our open-domain knowledge is also beneficial to VQA baselines. Code is available at https://github.com/yyyanglz/KAN.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据