☆ 4.7 Article

On the role of question encoder sequence model in robust visual question answering

PATTERN RECOGNITION (2022)

Journal

PATTERN RECOGNITION

Volume 131, Issue -, Pages -

Publisher

ELSEVIER SCI LTD

DOI: 10.1016/j.patcog.2022.108883

Keywords

Visual question answering; Out-of-distribution performance; Gated recurrent unit; Transformer; Graph attention network

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Generalizing beyond the experiences is important for developing robust machine learning systems. Current Visual Question Answering (VQA) models heavily rely on language-priors from the train set, leading to poor performance on Out-of-Distribution (OOD) test sets. This paper investigates the role of the sequence model architecture in the performance of VQA models on OOD datasets. A novel Graph attention network (GAT)-based question-encoder is proposed to mitigate the over-dependency on language biases and improve the OOD performance in VQA.

Generalizing beyond the experiences has a significant role in developing robust and practical machine learning systems. It has been shown that current Visual Question Answering (VQA) models are overdependent on the language-priors (spurious correlations between question-types and their most frequent answers) from the train set and pose poor performance on Out-of-Distribution (OOD) test sets. This conduct negatively affects the robustness of VQA models and restricts them from being utilized in real-world situations. This paper shows that the sequence model architecture used in the question-encoder has a significant role in the OOD performance of VQA models. To demonstrate this, we performed a detailed analysis of various existing RNN-based and Transformer-based question-encoders, and along, we proposed a novel Graph attention network (GAT)-based question-encoder. Our study found that a better choice of sequence model in the question-encoder reduces the over-fit to language biases and improves OOD performance in VQA even without using any additional relatively complex bias-mitigation approaches. (C) 2022 Elsevier Ltd. All rights reserved.

On the role of question encoder sequence model in robust visual question answering

Journal

PATTERN RECOGNITION

Publisher

ELSEVIER SCI LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

On the role of question encoder sequence model in robust visual question answering

Journal

PATTERN RECOGNITION

Publisher

ELSEVIER SCI LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper