相关参考文献
注意:仅列出部分参考文献,下载原文获取全部文献信息。
Article
Computer Science, Artificial Intelligence
Shengsheng Qian et al.
Summary: With the increasing amount of multimodal data, cross-modal retrieval has become a hot research topic, but existing techniques have limitations in eliminating modality heterogeneity, considering label relationships, and efficiently aligning representation and label similarity. To address these problems, this article proposes two models that use dual generative adversarial networks to project multimodal data into a common representation space, employ multi-hop graph neural networks to model label relation dependencies, and introduce a novel soft multi-label contrastive loss to align representation and label similarity. Experimental results on three benchmark datasets demonstrate the superiority of the proposed method.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Engineering, Electrical & Electronic
Chenggang Yan et al.
Summary: This paper proposes a Task-Adaptive Attention module for image captioning, which learns non-visual clues to address the misleading issue in attention models during word generation. The module is further enhanced with diversity regularization to improve expression ability. Experimental results on MSCOCO captioning dataset show that the module improves the performance of a vanilla Transformer-based image captioning model.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
(2022)
Article
Computer Science, Hardware & Architecture
Weian Li et al.
Summary: In this paper, a hash method considering modality-specific and cross-modal semantic information is proposed. The method achieves fast retrieval of hash codes with a closed solution and outperforms current methods in terms of mean average precision (mAP).
COMPUTERS & ELECTRICAL ENGINEERING
(2022)
Article
Engineering, Electrical & Electronic
Xinfeng Dong et al.
Summary: The completeness of semantic expression plays a crucial role in cross-modal retrieval tasks. By utilizing a graph convolutional network, our proposed model can obtain semantic complementary information and strengthen the similarities between samples with the same semantics. Experimental results demonstrate the superiority of our model compared to other state-of-the-art methods on three benchmark datasets.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
(2022)
Article
Computer Science, Artificial Intelligence
Zhuoyi Li et al.
Summary: This paper proposes a novel image-text Bidirectional Learning Network (BLN) based cross-modal retrieval method. It constructs a common representation space and measures the similarity of heterogeneous data. The method utilizes a multi-layer supervision network and a bidirectional crisscross loss function to preserve modal invariance. Experimental results demonstrate the effectiveness and superiority of the proposed method over existing cross-modal retrieval methods.
Article
Computer Science, Hardware & Architecture
Lei Zhang et al.
Summary: In this paper, a novel semi-supervised method called Semi-supervised Constrained Graph Convolutional Network (SCGCN) is proposed to exploit correlations from batch samples of data with different modalities. The method combines deep supervised learning with unsupervised learning to reduce the requirement of labeled data. By utilizing deep neural networks and graph convolutional networks, the method learns a modality-invariant semantic space and generates predicted labels from unlabeled data. Extensive experiments demonstrate the effectiveness of the proposed approach.
COMPUTERS & ELECTRICAL ENGINEERING
(2022)
Article
Computer Science, Information Systems
Xiao Shen et al.
Summary: This study proposes a novel multi-view graph cross-modal hashing (MGCH) method for generating hash codes in a semi-supervised manner. Unlike conventional graph-based hashing methods, MGCH only employs multi-view graphs as learning assistance and demonstrates superiority in cross-modal hashing tasks.
INFORMATION SCIENCES
(2022)
Article
Computer Science, Artificial Intelligence
Yun Liu et al.
Summary: The proposed Dual Semantic Preserving Hashing (DSPH) method addresses the challenges of semantic information utilization and discriminative hash code learning in cross-modal hashing by leveraging matrix factorization and discrete optimization strategy.
Article
Computer Science, Artificial Intelligence
Qi Zhao et al.
Summary: In this study, a feature consistency driven attention erasing network (FCAENet) is proposed for large-scale fine-grained image retrieval, addressing the issues of low accuracy and mapping problem. By incorporating an adaptive augmentation module and enhancing space relation loss, FCAENet can learn more representative hash codes and achieves state-of-the-art performance in fine-grained image retrieval.
PATTERN RECOGNITION
(2022)
Article
Engineering, Electrical & Electronic
Xinfeng Dong et al.
Summary: To improve the accuracy of retrieval across image-text modalities, this study proposes a hierarchical feature aggregation algorithm based on graph convolutional networks. It also utilizes attention mechanism and transformer to address semantic issues. Experimental results demonstrate the effectiveness of this model.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
(2022)
Article
Computer Science, Artificial Intelligence
Jing Li et al.
Summary: Deep cross-modal hashing is a flexible and efficient method for large-scale cross-modal retrieval. However, existing methods do not consider the semantic gap and distribution shift enough, leading to the inability to unify hashing codes across different modalities. To address these issues, this paper proposes the Discrete Fusion Adversarial Hashing (DFAH) network, which incorporates a Modality-Specific Feature Extractor, a Fusion Learner, and a Modality Discriminator. Additionally, an efficient discrete optimization strategy is designed. Experimental results show that DFAH outperforms state-of-the-art methods in cross-modal retrieval.
KNOWLEDGE-BASED SYSTEMS
(2022)
Article
Computer Science, Artificial Intelligence
Wenhui Jiang et al.
Summary: This paper focuses on attention mechanisms in image captioning and proposes a novel grounding model that dynamically links words to informative image regions. The proposed model improves both grounding and captioning performance by capturing linguistic characteristics and visual relevance. Additionally, a new quantitative metric for evaluating the correctness of the attention mechanism is introduced.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2022)
Article
Computer Science, Information Systems
Xu Wang et al.
Summary: The proposed Deep Relational Similarity Learning (DRSL) aims to bridge the heterogeneity gap of different modalities by directly learning the natural pairwise similarities, achieving state-of-the-art results in cross-modal retrieval tasks on benchmark datasets.
INFORMATION SCIENCES
(2021)
Article
Computer Science, Information Systems
Nicola Messina et al.
Summary: This study presents an innovative approach named Transformer Encoder Reasoning and Alignment Network (TERAN) for cross-modal image-sentence matching, achieving state-of-the-art results on image retrieval tasks and surpassing current methods on sentence retrieval tasks on the MS-COCO dataset. TERAN is designed to keep visual and textual data pipelines separate in large-scale retrieval systems, merging information from both domains only during the final alignment phase to pave the way for effective and efficient methods in cross-modal information retrieval.
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS
(2021)
Proceedings Paper
Computer Science, Artificial Intelligence
Xunlin Zhan et al.
Summary: Researchers investigate weakly-supervised multimodal instance-level product retrieval among fine-grained product categories and contribute the Product1M dataset, which contains over 1 million image-caption pairs with appealing features like fine-grained categories, complex combinations, and fuzzy correspondence.
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021)
(2021)
Proceedings Paper
Computer Science, Information Systems
Weike Jin et al.
Summary: The study proposed a Hierarchical Cross-Modal Graph Consistency Learning Network for video-text retrieval, considering multi-level graph consistency for video-text matching. Experimental results demonstrate the effectiveness of the approach on different datasets.
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL
(2021)
Proceedings Paper
Computer Science, Information Systems
Jiao Li et al.
Summary: The study focuses on image-recipe retrieval and proposes a novel framework called Hybrid Fusion with Intra- and Cross-Modality Attention (HF-ICMA) to address the inefficiency of existing methods in integrating key factors effectively. The HF-ICMA model enhances the accuracy of learning image-recipe similarity through intra-recipe fusion and image-recipe fusion modules.
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL
(2021)
Proceedings Paper
Computer Science, Information Systems
Zhixiong Zeng et al.
Summary: This paper introduces a prototype-based adaptive network (PAN) for robust cross-modal retrieval in real-world applications. The method leverages prototype learning and a prototype propagation strategy to address the issues of imbalanced test queries and training data.
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL
(2021)
Article
Computer Science, Information Systems
Zijian Wang et al.
Summary: The paper presents a dual-stream learning framework called Deep Collaborative Discrete Hashing (DCDH) which collaboratively constructs a discriminative common discrete space from visual and semantic features, achieving state-of-the-art image retrieval performance in large-scale multimedia retrieval tasks.
IEEE TRANSACTIONS ON MULTIMEDIA
(2021)
Article
Engineering, Electrical & Electronic
Xiushan Nie et al.
Summary: The study proposed a deep multiscale fusion hashing (DMFH) method for cross-modal retrieval, which designs different network branches and adopts multiscale fusion models to embed multiscale semantics into final hash codes, making them more representative. The DMFH can learn common hash codes directly without relaxation, avoiding accuracy loss during hash learning, and experimental results on three benchmark datasets demonstrate its relative superiority.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
(2021)
Article
Computer Science, Artificial Intelligence
Min Meng et al.
Summary: This article introduces a novel supervised cross-modal hashing method ASCSH, which decomposes mapping matrices to exploit correlation between modalities and uses a discrete asymmetric framework to fully explore supervised information, solving binary constraint problems effectively.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2021)
Article
Computer Science, Artificial Intelligence
Yifan Zhang et al.
Summary: Cross-modal retrieval is achieved through a Cross-modal Relation Guided Network (CRGN) for measuring the similarity between images and text sentences. By learning global feature guiding and sentence generation, the relation between image regions is modeled, leading to efficient retrieval between image and text.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2021)
Article
Computer Science, Artificial Intelligence
De Xie et al.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2020)
Article
Computer Science, Artificial Intelligence
Lin Wu et al.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2019)
Article
Computer Science, Artificial Intelligence
Cheng Deng et al.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2019)
Article
Computer Science, Artificial Intelligence
Devraj Mandal et al.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2019)
Article
Computer Science, Information Systems
Yuxin Peng et al.
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS
(2019)
Proceedings Paper
Computer Science, Artificial Intelligence
Zhao-Min Chen et al.
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019)
(2019)
Article
Computer Science, Information Systems
Liang Zhang et al.
IEEE TRANSACTIONS ON MULTIMEDIA
(2018)