4.5 Article

Multi-label adversarial fine-grained cross-modal retrieval

相关参考文献

注意:仅列出部分参考文献,下载原文获取全部文献信息。
Article Computer Science, Artificial Intelligence

Integrating Multi-Label Contrastive Learning With Dual Adversarial Graph Neural Networks for Cross-Modal Retrieval

Shengsheng Qian et al.

Summary: With the increasing amount of multimodal data, cross-modal retrieval has become a hot research topic, but existing techniques have limitations in eliminating modality heterogeneity, considering label relationships, and efficiently aligning representation and label similarity. To address these problems, this article proposes two models that use dual generative adversarial networks to project multimodal data into a common representation space, employ multi-hop graph neural networks to model label relation dependencies, and introduce a novel soft multi-label contrastive loss to align representation and label similarity. Experimental results on three benchmark datasets demonstrate the superiority of the proposed method.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Engineering, Electrical & Electronic

Task-Adaptive Attention for Image Captioning

Chenggang Yan et al.

Summary: This paper proposes a Task-Adaptive Attention module for image captioning, which learns non-visual clues to address the misleading issue in attention models during word generation. The module is further enhanced with diversity regularization to improve expression ability. Experimental results on MSCOCO captioning dataset show that the module improves the performance of a vanilla Transformer-based image captioning model.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2022)

Article Computer Science, Hardware & Architecture

Semantic Constraints Matrix Factorization Hashing for cross-modal retrieval

Weian Li et al.

Summary: In this paper, a hash method considering modality-specific and cross-modal semantic information is proposed. The method achieves fast retrieval of hash codes with a closed solution and outperforms current methods in terms of mean average precision (mAP).

COMPUTERS & ELECTRICAL ENGINEERING (2022)

Article Engineering, Electrical & Electronic

Adversarial Graph Convolutional Network for Cross-Modal Retrieval

Xinfeng Dong et al.

Summary: The completeness of semantic expression plays a crucial role in cross-modal retrieval tasks. By utilizing a graph convolutional network, our proposed model can obtain semantic complementary information and strengthen the similarities between samples with the same semantics. Experimental results demonstrate the superiority of our model compared to other state-of-the-art methods on three benchmark datasets.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2022)

Article Computer Science, Artificial Intelligence

Image-text bidirectional learning network based cross-modal retrieval

Zhuoyi Li et al.

Summary: This paper proposes a novel image-text Bidirectional Learning Network (BLN) based cross-modal retrieval method. It constructs a common representation space and measures the similarity of heterogeneous data. The method utilizes a multi-layer supervision network and a bidirectional crisscross loss function to preserve modal invariance. Experimental results demonstrate the effectiveness and superiority of the proposed method over existing cross-modal retrieval methods.

NEUROCOMPUTING (2022)

Article Computer Science, Hardware & Architecture

Semi-supervised constrained graph convolutional network for cross-modal retrieval

Lei Zhang et al.

Summary: In this paper, a novel semi-supervised method called Semi-supervised Constrained Graph Convolutional Network (SCGCN) is proposed to exploit correlations from batch samples of data with different modalities. The method combines deep supervised learning with unsupervised learning to reduce the requirement of labeled data. By utilizing deep neural networks and graph convolutional networks, the method learns a modality-invariant semantic space and generates predicted labels from unlabeled data. Extensive experiments demonstrate the effectiveness of the proposed approach.

COMPUTERS & ELECTRICAL ENGINEERING (2022)

Article Computer Science, Information Systems

Semi-supervised cross-modal hashing with multi-view graph representation

Xiao Shen et al.

Summary: This study proposes a novel multi-view graph cross-modal hashing (MGCH) method for generating hash codes in a semi-supervised manner. Unlike conventional graph-based hashing methods, MGCH only employs multi-view graphs as learning assistance and demonstrates superiority in cross-modal hashing tasks.

INFORMATION SCIENCES (2022)

Article Computer Science, Artificial Intelligence

An efficient dual semantic preserving hashing for cross-modal retrieval

Yun Liu et al.

Summary: The proposed Dual Semantic Preserving Hashing (DSPH) method addresses the challenges of semantic information utilization and discriminative hash code learning in cross-modal hashing by leveraging matrix factorization and discrete optimization strategy.

NEUROCOMPUTING (2022)

Article Computer Science, Artificial Intelligence

A feature consistency driven attention erasing network for fine-grained image retrieval

Qi Zhao et al.

Summary: In this study, a feature consistency driven attention erasing network (FCAENet) is proposed for large-scale fine-grained image retrieval, addressing the issues of low accuracy and mapping problem. By incorporating an adaptive augmentation module and enhancing space relation loss, FCAENet can learn more representative hash codes and achieves state-of-the-art performance in fine-grained image retrieval.

PATTERN RECOGNITION (2022)

Article Engineering, Electrical & Electronic

Hierarchical Feature Aggregation Based on Transformer for Image-Text Matching

Xinfeng Dong et al.

Summary: To improve the accuracy of retrieval across image-text modalities, this study proposes a hierarchical feature aggregation algorithm based on graph convolutional networks. It also utilizes attention mechanism and transformer to address semantic issues. Experimental results demonstrate the effectiveness of this model.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2022)

Article Computer Science, Artificial Intelligence

Discrete Fusion Adversarial Hashing for cross-modal retrieval

Jing Li et al.

Summary: Deep cross-modal hashing is a flexible and efficient method for large-scale cross-modal retrieval. However, existing methods do not consider the semantic gap and distribution shift enough, leading to the inability to unify hashing codes across different modalities. To address these issues, this paper proposes the Discrete Fusion Adversarial Hashing (DFAH) network, which incorporates a Modality-Specific Feature Extractor, a Fusion Learner, and a Modality Discriminator. Additionally, an efficient discrete optimization strategy is designed. Experimental results show that DFAH outperforms state-of-the-art methods in cross-modal retrieval.

KNOWLEDGE-BASED SYSTEMS (2022)

Article Computer Science, Artificial Intelligence

Visual Cluster Grounding for Image Captioning

Wenhui Jiang et al.

Summary: This paper focuses on attention mechanisms in image captioning and proposes a novel grounding model that dynamically links words to informative image regions. The proposed model improves both grounding and captioning performance by capturing linguistic characteristics and visual relevance. Additionally, a new quantitative metric for evaluating the correctness of the attention mechanism is introduced.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2022)

Article Computer Science, Information Systems

DRSL: Deep Relational Similarity Learning for Cross-modal Retrieval

Xu Wang et al.

Summary: The proposed Deep Relational Similarity Learning (DRSL) aims to bridge the heterogeneity gap of different modalities by directly learning the natural pairwise similarities, achieving state-of-the-art results in cross-modal retrieval tasks on benchmark datasets.

INFORMATION SCIENCES (2021)

Article Computer Science, Information Systems

Fine-Grained Visual Textual Alignment for Cross-Modal Retrieval Using Transformer Encoders

Nicola Messina et al.

Summary: This study presents an innovative approach named Transformer Encoder Reasoning and Alignment Network (TERAN) for cross-modal image-sentence matching, achieving state-of-the-art results on image retrieval tasks and surpassing current methods on sentence retrieval tasks on the MS-COCO dataset. TERAN is designed to keep visual and textual data pipelines separate in large-scale retrieval systems, merging information from both domains only during the final alignment phase to pave the way for effective and efficient methods in cross-modal information retrieval.

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-Modal Pretraining

Xunlin Zhan et al.

Summary: Researchers investigate weakly-supervised multimodal instance-level product retrieval among fine-grained product categories and contribute the Product1M dataset, which contains over 1 million image-caption pairs with appealing features like fine-grained categories, complex combinations, and fuzzy correspondence.

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) (2021)

Proceedings Paper Computer Science, Information Systems

Hierarchical Cross-Modal Graph Consistency Learning for Video-Text Retrieval

Weike Jin et al.

Summary: The study proposed a Hierarchical Cross-Modal Graph Consistency Learning Network for video-text retrieval, considering multi-level graph consistency for video-text matching. Experimental results demonstrate the effectiveness of the approach on different datasets.

SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (2021)

Proceedings Paper Computer Science, Information Systems

Hybrid Fusion with Intra- and Cross-Modality Attention for Image-Recipe Retrieval

Jiao Li et al.

Summary: The study focuses on image-recipe retrieval and proposes a novel framework called Hybrid Fusion with Intra- and Cross-Modality Attention (HF-ICMA) to address the inefficiency of existing methods in integrating key factors effectively. The HF-ICMA model enhances the accuracy of learning image-recipe similarity through intra-recipe fusion and image-recipe fusion modules.

SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (2021)

Proceedings Paper Computer Science, Information Systems

PAN: Prototype-based Adaptive Network for Robust Cross-modal Retrieval

Zhixiong Zeng et al.

Summary: This paper introduces a prototype-based adaptive network (PAN) for robust cross-modal retrieval in real-world applications. The method leverages prototype learning and a prototype propagation strategy to address the issues of imbalanced test queries and training data.

SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (2021)

Article Computer Science, Information Systems

Deep Collaborative Discrete Hashing With Semantic-Invariant Structure Construction

Zijian Wang et al.

Summary: The paper presents a dual-stream learning framework called Deep Collaborative Discrete Hashing (DCDH) which collaboratively constructs a discriminative common discrete space from visual and semantic features, achieving state-of-the-art image retrieval performance in large-scale multimedia retrieval tasks.

IEEE TRANSACTIONS ON MULTIMEDIA (2021)

Article Engineering, Electrical & Electronic

Deep Multiscale Fusion Hashing for Cross-Modal Retrieval

Xiushan Nie et al.

Summary: The study proposed a deep multiscale fusion hashing (DMFH) method for cross-modal retrieval, which designs different network branches and adopts multiscale fusion models to embed multiscale semantics into final hash codes, making them more representative. The DMFH can learn common hash codes directly without relaxation, avoiding accuracy loss during hash learning, and experimental results on three benchmark datasets demonstrate its relative superiority.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2021)

Article Computer Science, Artificial Intelligence

Asymmetric Supervised Consistent and Specific Hashing for Cross-Modal Retrieval

Min Meng et al.

Summary: This article introduces a novel supervised cross-modal hashing method ASCSH, which decomposes mapping matrices to exploit correlation between modalities and uses a discrete asymmetric framework to fully explore supervised information, solving binary constraint problems effectively.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2021)

Article Computer Science, Artificial Intelligence

Deep Relation Embedding for Cross-Modal Retrieval

Yifan Zhang et al.

Summary: Cross-modal retrieval is achieved through a Cross-modal Relation Guided Network (CRGN) for measuring the similarity between images and text sentences. By learning global feature guiding and sentence generation, the relation between image regions is modeled, leading to efficient retrieval between image and text.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2021)

Article Computer Science, Artificial Intelligence

Multi-Task Consistency-Preserving Adversarial Hashing for Cross-Modal Retrieval

De Xie et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2020)

Article Computer Science, Artificial Intelligence

Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval

Lin Wu et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2019)

Article Computer Science, Artificial Intelligence

Unsupervised Semantic-Preserving Adversarial Hashing for Image Search

Cheng Deng et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2019)

Article Computer Science, Artificial Intelligence

Generalized Semantic Preserving Hashing for Cross-Modal Retrieval

Devraj Mandal et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2019)

Article Computer Science, Information Systems

CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning

Yuxin Peng et al.

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS (2019)

Proceedings Paper Computer Science, Artificial Intelligence

Multi-Label Image Recognition with Graph Convolutional Networks

Zhao-Min Chen et al.

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) (2019)

Article Computer Science, Information Systems

Generalized Semi-supervised and Structured Subspace Learning for Cross-Modal Retrieval

Liang Zhang et al.

IEEE TRANSACTIONS ON MULTIMEDIA (2018)