4.7 Article

Encoder-decoder cycle for visual question answering based on perception-action cycle

相关参考文献

注意:仅列出部分参考文献,下载原文获取全部文献信息。
Article Computer Science, Artificial Intelligence

Adversarial Lagrangian integrated contrastive embedding for limited size datasets

Amin Jalali et al.

Summary: This study proposes an adversarial Lagrangian integrated contrastive embedding (ALICE) method for small-sized datasets. The method demonstrates improved accuracy and training convergence through pre-trained adversarial transfer. It also investigates an adversarial integrated contrastive model with various augmentation techniques and incorporates multi-objective augmented Lagrangian multipliers to encourage low-rank and sparsity.

NEURAL NETWORKS (2023)

Article Computer Science, Artificial Intelligence

MRA-Net: Improving VQA Via Multi-Modal Relation Attention Network

Liang Peng et al.

Summary: Visual Question Answering (VQA) is a task that aims to answer natural language questions about visual images. Existing approaches often use attention mechanisms to focus on relevant visual objects and consider the relationships between objects. However, these approaches have limitations in modeling complex object relationships and leveraging the cooperation between visual appearance and relationships. To address these issues, we propose a novel end-to-end VQA model, called Multi-modal Relation Attention Network (MRA-Net). The model combines textual and visual relations, utilizes self-guided word relation attention, and incorporates question-adaptive visual relation attention modules to improve performance and interpretability. Experimental results on multiple benchmark datasets demonstrate that our proposed model outperforms state-of-the-art approaches.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2022)

Article Computer Science, Artificial Intelligence

Learning multimodal relationship interaction for visual relationship detection

Zhixuan Liu et al.

Summary: This paper proposes a Multimodal Similarity Guided Relationship Interaction Network (MSGRIN) to explicitly model the relations of relationships in the graph neural network paradigm. In visual scenes, MSGRIN constructs an adaptive graph by taking visual relationships as nodes and enhances deep message passing through the introduction of Entity Appearance Reconstruction, Entity Relevance Filtering, and Multimodal Similarity Attention.

PATTERN RECOGNITION (2022)

Article Computer Science, Information Systems

Visual Relationship Detection With Image Position and Feature Information Embedding and Fusion

Jinghui Peng et al.

Summary: Visual relationship detection is an important direction in image processing, aiming to explore object relationships. This paper proposes a method that combines image location and feature information fusion to enhance the visual relationship detection effect, using a pre-training pre-processing dataset and an entities relationship network.

IEEE ACCESS (2022)

Article Computer Science, Artificial Intelligence

Low-shot transfer with attention for highly imbalanced cursive character recognition

Amin Jalali et al.

Summary: This paper proposes a new method to address the recognition of ancient Korean-Chinese cursive characters, overcoming challenges such as class imbalance and limited training data by utilizing low-shot regularization and decoupled classifier. Experimental results demonstrate the effectiveness of this approach in handling extreme class imbalances.

NEURAL NETWORKS (2021)

Article Computer Science, Information Systems

Visual Question Answering With Dense Inter- and Intra-Modality Interactions

Fei Liu et al.

Summary: The study introduces a novel DenIII framework for visual question answering, which models dense inter- and intra-modality interactions by densely connecting all pairwise layers of the network. Extensive experiments confirm the effectiveness of the method, with DenIII achieving state-of-the-art or competitive performance on three publicly available datasets.

IEEE TRANSACTIONS ON MULTIMEDIA (2021)

Article Computer Science, Artificial Intelligence

High cursive traditional Asian character recognition using integrated adaptive constraints in ensemble of DenseNet and Inception models

Amin Jalali et al.

PATTERN RECOGNITION LETTERS (2020)

Article Computer Science, Information Systems

Atrial Fibrillation Prediction With Residual Network Using Sensitivity and Orthogonality Constraints

Amin Jalali et al.

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS (2020)

Article Computer Science, Artificial Intelligence

Sensitive deep convolutional neural network for face recognition at large standoffs with small dataset

Amin Jalali et al.

EXPERT SYSTEMS WITH APPLICATIONS (2017)

Article Computer Science, Artificial Intelligence

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Ranjay Krishna et al.

INTERNATIONAL JOURNAL OF COMPUTER VISION (2017)

Proceedings Paper Computer Science, Artificial Intelligence

An Analysis of Visual Question Answering Algorithms

Kushal Kafle et al.

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) (2017)

Proceedings Paper Computer Science, Artificial Intelligence

Detecting Visual Relationships with Deep Relational Networks

Bo Dai et al.

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) (2017)

Proceedings Paper Computer Science, Artificial Intelligence

MUTAN: Multimodal Tucker Fusion for Visual Question Answering

Hedi Ben-younes et al.

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) (2017)