4.6 Article

Co-attention graph convolutional network for visual question answering

Related references

Note: Only part of the references are listed.
Article Computer Science, Software Engineering

Multiple Context Learning Networks for Visual Question Answering

Pufen Zhang et al.

Summary: A novel Multiple Context Learning Network (MCLN) is proposed for visual question answering (VQA), which models multiple contexts and learns comprehensive contexts. Different context learning modules are introduced to learn object and word contexts, and a deep context learning is achieved by stacking multiple context learning layers. The approach also introduces a contextualized text encoder based on pretrained BERT to enhance the textual context learning. The MCLN outperforms previous state-of-the-art models on benchmark datasets.

SCIENTIFIC PROGRAMMING (2022)

Article Computer Science, Information Systems

Evaluation of graph convolutional networks performance for visual question answering on reasoning datasets

Abdulganiyu Abdu Yusuf et al.

Summary: Graph neural networks, especially graph convolution network (GCN), have been widely used for vision-to-language tasks, such as visual question answering (VQA), and have shown promising results in capturing spatial and semantic relationships. However, the application of GCN on different subtasks in VQA datasets can lead to varying results. Factors such as training and testing size, evaluation metrics, and hyperparameters also affect the VQA results. This study proposed a GCN framework for VQA that uses fine-tuned word representation to handle reasoning type questions, and the performance of the framework was evaluated using various measures. The results obtained from GQA and VQA 2.0 datasets slightly outperformed most existing methods.

MULTIMEDIA TOOLS AND APPLICATIONS (2022)

Article Computer Science, Artificial Intelligence

Visual-semantic graph neural network with pose-position attentive learning for group activity recognition

Tianshan Liu et al.

Summary: The article proposes a method for recognizing group activities based on visual-semantic graph neural network and pose-position attentive learning. The method improves the recognition performance of group activities by constructing a bi-modal visual graph and a semantic graph, and utilizing pose and position information for attention aggregation.

NEUROCOMPUTING (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Comprehending and Ordering Semantics for Image Captioning

Yehao Li et al.

Summary: This paper proposes a new Transformer-style structure called COS-Net, which integrates semantic comprehension and ordering processes into a single architecture. By utilizing cross-modal retrieval and a semantic ranker, COS-Net achieves superior performance in image captioning tasks.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Visual Abductive Reasoning

Chen Liang et al.

Summary: This paper proposes a new task and dataset for examining the abductive reasoning ability of machine intelligence in everyday visual situations. The authors design a strong baseline model and conduct experiments on a large-scale dataset, showing that the model outperforms many famous video-language models in the VAR task but still falls behind human performance.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) (2022)

Article Computer Science, Information Systems

Cross-modal Graph Matching Network for Image-text Retrieval

Yuhao Cheng et al.

Summary: Image-text retrieval is a fundamental task in cross-modal research. Existing methods can be classified into independent representation matching and cross-interaction matching. This article proposes a method called CGMN, which explores both intra- and inter-relations without introducing network interaction. The experiments show that CGMN outperforms state-of-the-art methods in image retrieval and is more efficient than interactive matching methods.

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS (2022)

Article Computer Science, Information Systems

Answer Questions with Right Image Regions: A Visual Attention Regularization Approach

Yibing Liu et al.

Summary: This article introduces a novel visual attention regularization method called AttReg for better visual grounding in Visual Question Answering (VQA). AttReg identifies essential but ignored image regions for question answering and leverages a mask-guided learning scheme to regularize the visual attention for better focus on these key regions. Extensive experiments have demonstrated the effectiveness of AttReg and it has achieved state-of-the-art accuracy on benchmark datasets.

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS (2022)

Article Computer Science, Information Systems

Object-difference drived graph convolutional networks for visual question answering

Xi Zhu et al.

Summary: This research achieves outstanding performance in the VQA task by proposing an object-difference based graph learner, combining with a soft-attention mechanism and Graph Convolutional Networks. Experimental results demonstrate that the model outperforms baseline methods on the VQA 2.0 dataset.

MULTIMEDIA TOOLS AND APPLICATIONS (2021)

Article Computer Science, Artificial Intelligence

Interpretable Visual Question Answering by Reasoning on Dependency Trees

Qingxing Cao et al.

Summary: Collaborative reasoning for image-question pairs is important in interpretable visual question answering systems, but current models heavily rely on annotations or rules, leading to either heavy workloads or poor performance. This paper introduces a novel neural network model, PTGRN, which performs global reasoning on a dependency tree parsed from the question, showing superiority over current VQA methods on relational datasets. Visualization results highlight the explainable capability of PTGRN.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2021)

Article Computer Science, Artificial Intelligence

Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering

Liyang Zhang et al.

Summary: The new framework KAN utilizes object-related knowledge and a knowledge graph to assist in the reasoning process of VQA, with an attention module that adaptively balances the importance of external knowledge against detected objects. Extensive experiments demonstrate that KAN achieves state-of-the-art performance on challenging VQA datasets and provides benefits to VQA baselines.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2021)

Article Computer Science, Artificial Intelligence

Visual question answering model based on graph neural network and contextual attention

Himanshu Sharma et al.

Summary: Visual Question Answering (VQA) is an emerging research area in computer vision and natural language processing, aiming to predict answers to natural questions related to images. However, current VQA approaches often overlook the relationship and reasoning among regions of interest. The proposed VQA model introduced in this paper considers previously attended visual content, leading to improved accuracy in answer prediction.

IMAGE AND VISION COMPUTING (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Interpretable Visual Reasoning via Induced Symbolic Space

Zhonghao Wang et al.

Summary: The study focuses on concept induction in visual reasoning, achieving an interpretable model through a new framework named OCCAM. By inducing concepts of objects and relations and imposing OCCAM on the induced symbolic concept space, the model achieves state-of-the-art performance without human-annotated functional programs on CLEVR and GQA datasets.

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Linguistically Routing Capsule Network for Out-of-distribution Visual Question Answering

Qingxing Cao et al.

Summary: The study introduces a method using capsule networks and Linguistically Routing to improve generalization on out-of-distribution data in visual question answering and shows promising experimental results.

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) (2021)

Proceedings Paper Computer Science, Information Systems

Passage Retrieval for Outside-Knowledge Visual Question Answering

Chen Qu et al.

Summary: This work addresses multi-modal information needs involving text questions and images by focusing on passage retrieval for outside-knowledge visual question answering. The study shows that dense retrieval significantly outperforms sparse retrieval, and that image captions are more informative than object names.

SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (2021)

Article Computer Science, Artificial Intelligence

Multimodal feature fusion by relational reasoning and attention for visual question answering

Weifeng Zhang et al.

INFORMATION FUSION (2020)

Proceedings Paper Computer Science, Artificial Intelligence

Language-Conditioned Graph Networks for Relational Reasoning

Ronghang Hu et al.

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) (2019)

Proceedings Paper Computer Science, Artificial Intelligence

Reasoning Visual Dialogs with Structural and Partial Observations

Zilong Zheng et al.

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) (2019)

Proceedings Paper Computer Science, Software Engineering

Stacked Self-Attention Networks for Visual Question Answering

Qiang Sun et al.

ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (2019)

Article Computer Science, Artificial Intelligence

Image Captioning and Visual Question Answering Based on Attributes and External Knowledge

Qi Wu et al.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2018)

Article Computer Science, Artificial Intelligence

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren et al.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2017)

Proceedings Paper Computer Science, Artificial Intelligence

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

Yash Goyal et al.

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) (2017)

Proceedings Paper Computer Science, Artificial Intelligence

Visual Dialog

Abhishek Das et al.

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) (2017)

Proceedings Paper Computer Science, Artificial Intelligence

VQA: Visual Question Answering

Stanislaw Antol et al.

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) (2015)