Encoder-decoder cycle for visual question answering based on perception-action cycle

Article Computer Science, Artificial Intelligence

Adversarial Lagrangian integrated contrastive embedding for limited size datasets

Amin Jalali et al.

Summary: This study proposes an adversarial Lagrangian integrated contrastive embedding (ALICE) method for small-sized datasets. The method demonstrates improved accuracy and training convergence through pre-trained adversarial transfer. It also investigates an adversarial integrated contrastive model with various augmentation techniques and incorporates multi-objective augmented Lagrangian multipliers to encourage low-rank and sparsity.

NEURAL NETWORKS (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

MRA-Net: Improving VQA Via Multi-Modal Relation Attention Network

Liang Peng et al.

Summary: Visual Question Answering (VQA) is a task that aims to answer natural language questions about visual images. Existing approaches often use attention mechanisms to focus on relevant visual objects and consider the relationships between objects. However, these approaches have limitations in modeling complex object relationships and leveraging the cooperation between visual appearance and relationships. To address these issues, we propose a novel end-to-end VQA model, called Multi-modal Relation Attention Network (MRA-Net). The model combines textual and visual relations, utilizes self-guided word relation attention, and incorporates question-adaptive visual relation attention modules to improve performance and interpretability. Experimental results on multiple benchmark datasets demonstrate that our proposed model outperforms state-of-the-art approaches.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Learning multimodal relationship interaction for visual relationship detection

Zhixuan Liu et al.

Summary: This paper proposes a Multimodal Similarity Guided Relationship Interaction Network (MSGRIN) to explicitly model the relations of relationships in the graph neural network paradigm. In visual scenes, MSGRIN constructs an adaptive graph by taking visual relationships as nodes and enhances deep message passing through the introduction of Entity Appearance Reconstruction, Entity Relevance Filtering, and Multimodal Similarity Attention.

PATTERN RECOGNITION (2022)

添加到收藏夹

Article Computer Science, Information Systems

Visual Relationship Detection With Image Position and Feature Information Embedding and Fusion

Jinghui Peng et al.

Summary: Visual relationship detection is an important direction in image processing, aiming to explore object relationships. This paper proposes a method that combines image location and feature information fusion to enhance the visual relationship detection effect, using a pre-training pre-processing dataset and an entities relationship network.

IEEE ACCESS (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Low-shot transfer with attention for highly imbalanced cursive character recognition

Amin Jalali et al.

Summary: This paper proposes a new method to address the recognition of ancient Korean-Chinese cursive characters, overcoming challenges such as class imbalance and limited training data by utilizing low-shot regularization and decoupled classifier. Experimental results demonstrate the effectiveness of this approach in handling extreme class imbalances.

NEURAL NETWORKS (2021)

添加到收藏夹

Article Computer Science, Information Systems

Visual Question Answering With Dense Inter- and Intra-Modality Interactions

Fei Liu et al.

Summary: The study introduces a novel DenIII framework for visual question answering, which models dense inter- and intra-modality interactions by densely connecting all pairwise layers of the network. Extensive experiments confirm the effectiveness of the method, with DenIII achieving state-of-the-art or competitive performance on three publicly available datasets.

IEEE TRANSACTIONS ON MULTIMEDIA (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence