☆ 4.7 Article

Visual question answering model based on graph neural network and contextual attention

IMAGE AND VISION COMPUTING (2021)

Journal

IMAGE AND VISION COMPUTING

Volume 110, Issue -, Pages -

Publisher

ELSEVIER

DOI: 10.1016/j.imavis.2021.104165

Keywords

Visual question answering; Computer vision; Natural language processing; Attention

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Visual Question Answering (VQA) is an emerging research area in computer vision and natural language processing, aiming to predict answers to natural questions related to images. However, current VQA approaches often overlook the relationship and reasoning among regions of interest. The proposed VQA model introduced in this paper considers previously attended visual content, leading to improved accuracy in answer prediction.

Visual Question Answering (VQA) has recently appeared as a hot research area in the field of computer vision and natural language processing. A VQA model uses both image and question features and fuses them to predict an answer for a given natural question related to an image. However, most VQA approaches using attention mechanism mainly concentrate on extraction of visual information from regions of interests for answer prediction and ignore the relation between the regions of interests together with the reasoning among these regions. Apart from this limitation, VQA approaches also ignore the regions which are previously attended for answer generation. These regions which are attended in past can guide the selection of the subsequent regions of attention. In this paper, a novel VQA model is presented and formulated that utilizes this relationship between the regions and employs visual context based attention that takes into account the previously attended visual content. Experimental results demonstrate that the proposed VQA model boosts the accuracy of answer prediction on publically available datasets VQA 1.0 and VQA 2.0. (c) 2021 Elsevier B.V. All rights reserved.

Visual question answering model based on graph neural network and contextual attention

Journal

IMAGE AND VISION COMPUTING

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Visual question answering model based on graph neural network and contextual attention

Journal

IMAGE AND VISION COMPUTING

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper