4.7 Article

On the role of question encoder sequence model in robust visual question answering

Journal

PATTERN RECOGNITION
Volume 131, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2022.108883

Keywords

Visual question answering; Out-of-distribution performance; Gated recurrent unit; Transformer; Graph attention network

Ask authors/readers for more resources

Generalizing beyond the experiences is important for developing robust machine learning systems. Current Visual Question Answering (VQA) models heavily rely on language-priors from the train set, leading to poor performance on Out-of-Distribution (OOD) test sets. This paper investigates the role of the sequence model architecture in the performance of VQA models on OOD datasets. A novel Graph attention network (GAT)-based question-encoder is proposed to mitigate the over-dependency on language biases and improve the OOD performance in VQA.
Generalizing beyond the experiences has a significant role in developing robust and practical machine learning systems. It has been shown that current Visual Question Answering (VQA) models are overdependent on the language-priors (spurious correlations between question-types and their most frequent answers) from the train set and pose poor performance on Out-of-Distribution (OOD) test sets. This conduct negatively affects the robustness of VQA models and restricts them from being utilized in real-world situations. This paper shows that the sequence model architecture used in the question-encoder has a significant role in the OOD performance of VQA models. To demonstrate this, we performed a detailed analysis of various existing RNN-based and Transformer-based question-encoders, and along, we proposed a novel Graph attention network (GAT)-based question-encoder. Our study found that a better choice of sequence model in the question-encoder reduces the over-fit to language biases and improves OOD performance in VQA even without using any additional relatively complex bias-mitigation approaches. (C) 2022 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available