☆ 3.8 Proceedings Paper

CRA-Net: Composed Relation Attention Network for Visual Question Answering

PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19) (2019)

Journal

PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19)

Volume -, Issue -, Pages 1202-1210

Publisher

ASSOC COMPUTING MACHINERY

DOI: 10.1145/3343031.3350925

Keywords

Visual Question Answering; Visual Relation; Attention Mechanism; Relation Attention

Funding

National Natural Science Foundation of China [61572108, 61632007]
Sichuan Science and Technology Program [2018GZDZX0032]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The task of Visual Question Answering (VQA) is to answer a natural language question tied to the content of a visual image. Most existing VQA models either apply attention mechanism to locate the relevant object regions and/or utilize the off-the-shelf methods of the relation reasoning to detect object relations. However, they 1) mostly encode the simple relations which cannot sufficiently provide sophisticated knowledge for answering complicated visual questions; 2) seldom leverage the harmony cooperation of the object appearance feature and relation feature. To address these problems, we propose a novel end-to-end VQA model, termed Composed Relation Attention Network (CRA-Net). In specific, we devise two question-adaptive relation attention modules that can extract not only the fine-grained and precise binary relations but also the more sophisticated trinary relations. Both kinds of question-related relations can reveal deeper semantics, thereby enhancing the reasoning ability in question answering. Furthermore, our CRA-Net also combines the object appearance feature with the relation feature under the guidance of the corresponding question, which can reconcile the two types of features effectively. Extensive experiments on two large benchmark datasets, VQA-1.0 and VQA-2.0, demonstrate that our proposed model outperforms state-of-the-art approaches.

CRA-Net: Composed Relation Attention Network for Visual Question Answering

Journal

PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19)

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

CRA-Net: Composed Relation Attention Network for Visual Question Answering

Journal

PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19)

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper