4.7 Article

Path-Wise Attention Memory Network for Visual Question Answering

Related references

Note: Only part of the references are listed.
Article Computer Science, Artificial Intelligence

Robust Sparse Weighted Classification for Crowdsourcing

Hao Yu et al.

Summary: This paper proposes a robust sparse weighted classification algorithm to address the issue of obtaining high-quality labels in crowdsourcing tasks by adjusting misclassified samples in the original labels.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2023)

Article Computer Science, Hardware & Architecture

Deep Multigraph Hierarchical Enhanced Semantic Representation for Cross-Modal Retrieval

Lei Zhu et al.

Summary: This article proposes a novel method for cross-modal representation that efficiently achieves cross-modal semantic alignment and reduces heterogeneity gap. By combining multigraph-based hierarchical enhanced semantic representation with cross-modal adversarial learning, the method captures multigrained semantic knowledge and generates modalities-invariant representations.

IEEE MULTIMEDIA (2022)

Article Computer Science, Artificial Intelligence

PPIS-JOIN: A Novel Privacy-Preserving Image Similarity Join Method

Chengyuan Zhang et al.

Summary: This study introduces a novel privacy-preserving image similarity join method, combining deep image hashing and affine transformation to support efficient and accurate similarity search, and achieves better performance in experiments.

NEURAL PROCESSING LETTERS (2022)

Article Engineering, Civil

Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges

Di Feng et al.

Summary: Deep learning is driving recent advancements in perception for autonomous driving through the fusion of multiple sensors, but questions regarding network architecture design, fusion timing, and methods remain open. This review aims to systematically summarize methodologies for deep multi-modal object detection and semantic segmentation in autonomous driving, while also discussing challenges and open questions. The reviewed study provides an overview of the topic, fusion methodologies, and offers an interactive online platform for further exploration.

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS (2021)

Article Computer Science, Information Systems

HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval

Chengyuan Zhang et al.

Summary: The Hybrid Cross-Modal Similarity Learning model (HCMSL) proposed in this article effectively addresses the similarity measurement issue in cross-modal retrieval by capturing semantic information and establishing a common subspace between different modalities. Comprehensive experiments demonstrate significant improvements over existing techniques on real datasets.

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS (2021)

Proceedings Paper Computer Science, Artificial Intelligence

M2GUDA: Multi-Metrics Graph-Based Unsupervised Domain Adaptation for Cross-Modal Hashing

Chengyuan Zhang et al.

Summary: This paper proposes an unsupervised domain adaptation method for cross-modal hashing, using multiple consistency constraints for domain adaptation learning. Experimental results demonstrate the effectiveness of this method.

PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21) (2021)

Article Computer Science, Artificial Intelligence

Re-Attention for Visual Question Answering

Wenya Guo et al.

Summary: In this paper, a re-attention framework is proposed to utilize answer information for describing visual contents in VQA. Experiments show that the proposed model performs favorably against state-of-the-art methods.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2021)

Article Computer Science, Artificial Intelligence

Stimulus-driven and concept-driven analysis for image caption generation

Songtao Ding et al.

NEUROCOMPUTING (2020)

Article Computer Science, Artificial Intelligence

Multimodal feature fusion by relational reasoning and attention for visual question answering

Weifeng Zhang et al.

INFORMATION FUSION (2020)

Article Computer Science, Interdisciplinary Applications

Unpaired Multi-Modal Segmentation via Knowledge Distillation

Qi Dou et al.

IEEE TRANSACTIONS ON MEDICAL IMAGING (2020)

Article Computer Science, Artificial Intelligence

Compositional Attention Networks With Two-Stream Fusion for Video Question Answering

Ting Yu et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2020)

Article Computer Science, Artificial Intelligence

Multi-Task Consistency-Preserving Adversarial Hashing for Cross-Modal Retrieval

De Xie et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2020)

Article Computer Science, Artificial Intelligence

DRAU: Dual Recurrent Attention Units for Visual Question Answering

Ahmed Osman et al.

COMPUTER VISION AND IMAGE UNDERSTANDING (2019)

Article Computer Science, Theory & Methods

A long video caption generation algorithm for big video data retrieval

Songtao Ding et al.

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE (2019)

Article Computer Science, Information Systems

Predicting Visual Features From Text for Image and Video Caption Retrieval

Jianfeng Dong et al.

IEEE TRANSACTIONS ON MULTIMEDIA (2018)

Article Computer Science, Information Systems

Joint feature selection and graph regularization for modality-dependent cross-modal retrieval

Li Wang et al.

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION (2018)

Article Computer Science, Artificial Intelligence

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren et al.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2017)