4.8 Article

MRA-Net: Improving VQA Via Multi-Modal Relation Attention Network

Related references

Note: Only part of the references are listed.
Article Computer Science, Artificial Intelligence

Exploiting Subspace Relation in Semantic Labels for Cross-Modal Hashing

Heng Tao Shen et al.

Summary: Cross-modal hashing method SRLCH proposed in this paper utilizes label relation information in the semantic space to enhance the similarity between different modalities. The algorithm preserves modality relationships, discrete constraints, and nonlinear structures, providing a closed-form binary codes solution.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2021)

Article Computer Science, Artificial Intelligence

Cross-Modal Attention With Semantic Consistence for Image-Text Matching

Xing Xu et al.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2020)

Article Computer Science, Artificial Intelligence

More Is Better: Precise and Detailed Image Captioning Using Online Positive Recall and Missing Concepts Mining

Mingxing Zhang et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2019)

Article Automation & Control Systems

Describing Video With Attention-Based Bidirectional LSTM

Yi Bin et al.

IEEE TRANSACTIONS ON CYBERNETICS (2019)

Article Computer Science, Information Systems

Word-to-region attention network for visual question answering

Liang Peng et al.

MULTIMEDIA TOOLS AND APPLICATIONS (2019)

Proceedings Paper Computer Science, Interdisciplinary Applications

CRA-Net: Composed Relation Attention Network for Visual Question Answering

Liang Peng et al.

PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19) (2019)

Article Computer Science, Artificial Intelligence

Video Captioning by Adversarial LSTM

Yang Yang et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2018)

Article Computer Science, Artificial Intelligence

Robust discrete code modeling for supervised hashing

Yadan Luo et al.

PATTERN RECOGNITION (2018)

Article Computer Science, Artificial Intelligence

Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering

Zhou Yu et al.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2018)

Article Computer Science, Artificial Intelligence

Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval

Xing Xu et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2017)

Article Computer Science, Artificial Intelligence

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Ranjay Krishna et al.

INTERNATIONAL JOURNAL OF COMPUTER VISION (2017)

Proceedings Paper Computer Science, Artificial Intelligence

An Analysis of Visual Question Answering Algorithms

Kushal Kafle et al.

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) (2017)

Proceedings Paper Computer Science, Artificial Intelligence

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

Yash Goyal et al.

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) (2017)

Proceedings Paper Computer Science, Theory & Methods

Adaptively Attending to Visual Attributes and Linguistic Knowledge for Captioning

Yi Bin et al.

PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17) (2017)

Proceedings Paper Computer Science, Artificial Intelligence

BidirectionalLong-Short Term Memory for Video Description

Yi Bin et al.

MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE (2016)

Proceedings Paper Computer Science, Artificial Intelligence

VQA: Visual Question Answering

Stanislaw Antol et al.

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) (2015)