4.7 Article

Comprehensive-perception dynamic reasoning for visual question answering

Journal

PATTERN RECOGNITION
Volume 131, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2022.108878

Keywords

Cross-modal information fusion; Visual question answering; Comprehensive perception; Relational reasoning

Funding

  1. Beijing Natural Science Foundation [4222032]
  2. BUPT Excellent Ph.D.
  3. Students Foundation [CX2022225]

Ask authors/readers for more resources

The goal of Visual Question Answering (VQA) is to answer questions based on an image. Reasoning plays a crucial role in dealing with relations in the VQA task, as it requires modeling complex features. Existing models typically extract and integrate features only between adjacent layers, which may affect the integrity of information interaction. This paper proposes a comprehensive-perception dynamic reasoning (CPDR) model that utilizes cross-layer object features for multi-step compound reasoning, achieving superior performance and bringing considerable improvements when incorporated into VLP models.
The goal of Visual Question Answering (VQA) is to answer questions based on an image. In the VQA task, reasoning plays an important role in dealing with relations because this task has a high requirement for modeling complex features. In most existing models, the features are only extracted and integrated between adjacent layers. This pattern arguably affects the integrity of information interaction during reasoning. In this paper, we propose a comprehensive-perception dynamic reasoning (CPDR) model to utilize the cross-layer object features for multi-step compound reasoning. It calculates the interactions among the object features from all previous layers and integrates these interactions to generate new object features, iteratively. Finally, the object features of all layers will be used for the final prediction. Empirical results show that our model achieves superior performance among VQA models which are not VLP-based, and incorporating the CPDR module into the VLP models brings considerable performance improvements. (C) 2022 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available