Explanation vs. attention: A two-player game to obtain attention for VQA and visual dialog

Article Computer Science, Artificial Intelligence

Text-instance graph: Exploring the relational semantics for text-based visual question answering

Xiangpeng Li et al.

Summary: This study addresses the TextVQA problem and proposes a novel Text-Instance Graph (TIG) network to tackle the challenge. TIG models relationships between objects by building an OCR-OBJ graph and introduces a dynamic OCR-OBJ graph network to handle complex logic questions. Experimental results demonstrate the superior effectiveness of the proposed method compared to existing approaches.

PATTERN RECOGNITION (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

DecomVQANet: Decomposing visual question answering deep network via tensor decomposition and regression

Zongwen Bai et al.

Summary: The study proposed a novel comprehensive solution to compress and accelerate Visual Question Answering systems. By applying various decomposition methods and regression strategies, the Fully Connected layers in Convolutional Neural Network and Long Short Term Memory were successfully compressed, achieving high compression ratios with minimal accuracy drop.

PATTERN RECOGNITION (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Probabilistic framework for solving visual dialog

Badri N. Patro et al.

Summary: This paper proposes a probabilistic framework for solving the task of 'Visual Dialog', aiming to understand and analyze the sources of uncertainty for solving this task. The proposed probabilistic framework leads to an improved and more explainable visual dialog system.

PATTERN RECOGNITION (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

GuessWhich? Visual dialog with attentive memory network

Lei Zhao et al.

Summary: Visual dialog is a task involving two agents communicating in natural language with information asymmetry. A novel approach based on an attentive memory network is proposed to fully utilize image and historical dialog information. Experimental results demonstrate the effectiveness of this method in the visual dialog task, outperforming existing state-of-the-art methods.

PATTERN RECOGNITION (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Accuracy vs. complexity: A trade-off in visual question answering models

Moshiur Farazi et al.

Summary: This paper systematically studies the trade-off between model complexity and performance in VQA models, with a specific focus on the impact of multi-modal fusion. Through thorough experimental evaluation, three proposals are presented, optimized for minimal complexity, balanced complexity-accuracy, and state-of-the-art VQA performance.

PATTERN RECOGNITION (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Visual question answering: Which investigated applications?

Silvio Barra et al.

Summary: Visual Question Answering (VQA) is a challenging research area that requires a combination of computer vision and natural language processing abilities. Unlike other visual tasks, VQA requires comparing the semantics of images or videos with questions posed in natural language. Recent research has focused on image processing, language processing methods, and approaches to information fusion.

PATTERN RECOGNITION LETTERS (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Dual self-attention with co-attention networks for visual question answering

Yun Liu et al.

Summary: Visual Question Answering (VQA) is an important task in understanding vision and language. A novel model, DSACA, was proposed to address the integration problem between local features and global dependencies, using dual self-attention with co-attention networks.

PATTERN RECOGNITION (2021)

Add to Collection

Article Computer Science, Artificial Intelligence