期刊
PATTERN RECOGNITION
卷 120, 期 -, 页码 -出版社
ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2021.108153
关键词
Faster R-CNN; DBpedia spotlight; knowledge base; VQA
资金
- Sichuan Science and Technology Program [2019YJ0189, 2021YFQ0003]
This paper constructs a knowledge base graph embedding module to extend the versatility of knowledge-based VQA models. By extracting core entities from images and text and performing sub-graph embedding, the accuracy of knowledge-based VQA models is improved.
In this paper, a knowledge base graph embedding module is constructed to extend the versatility of knowledge-based VQA (Visual Question Answering) models. The knowledge base graph embedding module constructed in this paper extracts core entities from images and text, and maps them as knowledge base entities, then extracts the sub-graphs closely related to the core entities, and converts the sub-graphs into low-dimensional vectors to realize sub-graph embedding. In order to achieve good sub graph embedding, we first extracted two experimental knowledge bases with rich semantics from DBpedia: DBV and DBA. Based on these two knowledge bases, this paper selects several excellent models in knowledge base embedding as test models, including SE (structured embedding),SME(semantic matching energy function), and TransE model to produce link prediction. The results show that there is a clear correspondence between the entities of the DBV, which can achieve excellent node embedding. And the TransE model can achieve a good knowledge base embedding, so we built the knowledge base graph embedding module based on TransE. And then we construct a VQA model (KBSN) based on the knowledge base graph embedding. Experimental results on VQA2.0 and KB-VQA data sets prove that the knowledge base graph embedding module improves the accuracy. (c) 2021 Elsevier Ltd. All rights reserved.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据