3.8 Proceedings Paper

CRA-Net: Composed Relation Attention Network for Visual Question Answering

Publisher

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3343031.3350925

Keywords

Visual Question Answering; Visual Relation; Attention Mechanism; Relation Attention

Funding

  1. National Natural Science Foundation of China [61572108, 61632007]
  2. Sichuan Science and Technology Program [2018GZDZX0032]

Ask authors/readers for more resources

The task of Visual Question Answering (VQA) is to answer a natural language question tied to the content of a visual image. Most existing VQA models either apply attention mechanism to locate the relevant object regions and/or utilize the off-the-shelf methods of the relation reasoning to detect object relations. However, they 1) mostly encode the simple relations which cannot sufficiently provide sophisticated knowledge for answering complicated visual questions; 2) seldom leverage the harmony cooperation of the object appearance feature and relation feature. To address these problems, we propose a novel end-to-end VQA model, termed Composed Relation Attention Network (CRA-Net). In specific, we devise two question-adaptive relation attention modules that can extract not only the fine-grained and precise binary relations but also the more sophisticated trinary relations. Both kinds of question-related relations can reveal deeper semantics, thereby enhancing the reasoning ability in question answering. Furthermore, our CRA-Net also combines the object appearance feature with the relation feature under the guidance of the corresponding question, which can reconcile the two types of features effectively. Extensive experiments on two large benchmark datasets, VQA-1.0 and VQA-2.0, demonstrate that our proposed model outperforms state-of-the-art approaches.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available