4.7 Article

Rare-aware attention network for image-text matching

相关参考文献

注意:仅列出部分参考文献,下载原文获取全部文献信息。
Article Computer Science, Artificial Intelligence

Learning Relation Prototype From Unlabeled Texts for Long-Tail Relation Extraction

Yixin Cao et al.

Summary: Relation Extraction (RE) is important for completing Knowledge Graph (KG) by extracting entity relations from texts. However, it often faces the long-tail issue due to limited training data for all relation types. This paper proposes a general approach to learn relation prototypes from unlabeled texts, transferring knowledge from relation types with sufficient training data to facilitate long-tail relation extraction. Experimental results demonstrate the effectiveness of the learned relation prototypes and the proposed model outperforms state-of-the-art baselines.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2023)

Article Computer Science, Information Systems

Cognitive multi-modal consistent hashing with flexible semantic transformation

Junfeng An et al.

Summary: The proposed Cognitive Multi-modal Consistent Hashing (CMCH) framework tackles the challenge of exploring and preserving semantic-consistent information across multiple modalities in large-scale social geo-media multimedia data. By incorporating collaborative multi-modal fusion and deep semantic transform learning, CMCH outperforms existing methods in Multi-modal information retrieval and computational efficiency.

INFORMATION PROCESSING & MANAGEMENT (2022)

Article Computer Science, Information Systems

Cross-modal image-text search via Efficient Discrete Class Alignment Hashing

Song Wang et al.

Summary: This study proposes a novel discrete supervised hashing method that integrates class alignment and matrix factorization to simultaneously learn hash codes and hash functions. Furthermore, an improved two-step hashing strategy is introduced to enhance learning efficiency.

INFORMATION PROCESSING & MANAGEMENT (2022)

Article Engineering, Electrical & Electronic

Dual-Level Representation Enhancement on Characteristic and Context for Image-Text Retrieval

Song Yang et al.

Summary: This paper proposes a dual-level representation enhancement network (DREN) to improve image-text retrieval. By exploring characteristics and contexts of regions and words in a joint manner, accurate matching of image-text pairs is achieved, leading to superior retrieval performance.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2022)

Article Engineering, Electrical & Electronic

Region-Aware Image Captioning via Interaction Learning

An-An Liu et al.

Summary: Image captioning, one of the primary goals in computer vision, aims to automatically generate natural descriptions for images. This paper proposes a region-aware interaction learning method to explicitly capture the semantic correlations between regions and objects for word inference, effectively capturing contextual information.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2022)

Article Computer Science, Information Systems

Cross-Modal Multitask Transformer for End-to-End Multimodal Aspect-Based Sentiment Analysis

Li Yang et al.

Summary: In this paper, we propose a multi-task learning framework called CMMT for End-to-End Multimodal Aspect-Based Sentiment Analysis. Experimental results demonstrate that CMMT consistently outperforms the state-of-the-art approach JML and achieves superior performance in aspect extraction and sentiment classification compared to other systems.

INFORMATION PROCESSING & MANAGEMENT (2022)

Article Computer Science, Information Systems

BCMF: A bidirectional cross-modal fusion model for fake news detection

Chuanming Yu et al.

Summary: In recent years, fake news detection has gained much attention. However, most current approaches only utilize features from a single modality, neglecting the importance of comprehensive fusion between features of different modalities. In this study, we propose a novel model called BCMF, which comprehensively integrates textual and visual representations in a bidirectional manner, achieving improved classification accuracy on various datasets.

INFORMATION PROCESSING & MANAGEMENT (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Negative-Aware Attention Framework for Image-Text Matching

Kun Zhang et al.

Summary: This paper proposes a novel Negative-Aware Attention Framework (NAAF) for image-text matching, which utilizes both the positive effect of matched fragments and the negative effect of mismatched fragments to improve the performance.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

EASE: Unsupervised Discriminant Subspace Learning for Transductive Few-Shot Learning

Hao Zhu et al.

Summary: The study introduces EASE and SIAMESE methods, which focus on linear projection for transductive few-shot learning and extending clustering with labeled support samples. These approaches improve FSL performance by enhancing feature generation and query predictions.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Retrieval Augmented Classification for Long-Tail Visual Recognition

Alexander Long et al.

Summary: The article introduces a generic method called Retrieval Augmented Classification (RAC) to enhance image classification pipelines with an explicit retrieval module. RAC shows significant improvement in long-tail classification and its retrieval module learns high accuracy for tail classes, allowing the base encoder to focus on common classes.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) (2022)

Proceedings Paper Computer Science, Information Systems

Dynamic Modality Interaction Modeling for Image-Text Retrieval

Leigang Qu et al.

Summary: The paper proposes a novel multimodal interaction modeling network based on the routing mechanism for image-text retrieval. By designing different levels of modality interaction units and connecting them to construct a routing space, the model can dynamically learn different activated paths for different data.

SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (2021)

Article Computer Science, Artificial Intelligence

Fs-DSM: Few-Shot Diagram-Sentence Matching via Cross-Modal Attention Graph Model

Xin Hu et al.

Summary: The study presents a cross-modal attention graph model, Fs-DSM, for few-shot diagram-sentence matching task, achieving superior performance on the AI2D diagram dataset and two public benchmark datasets with nature images through graph initialization, information propagation, and global association modules.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Self-Distillation for Few-Shot Image Captioning

Xianyu Chen et al.

Summary: This paper proposes an ensemble-based self-distillation method for few-shot image captioning, allowing models to be trained with unpaired images and captions. The method shows significant performance improvements on different models and datasets with only a small amount of annotated training data.

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021) (2021)

Article Computer Science, Artificial Intelligence

Learning Person Re-Identification Models From Videos With Weak Supervision

Xueping Wang et al.

Summary: This study introduces a method for learning person re-identification models from videos with weak supervision, utilizing a multiple instance attention learning framework. By focusing on video-level labels, this approach outperforms traditional supervised methods in person re-identification tasks.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2021)

Article Computer Science, Artificial Intelligence

A Pairwise Attentive Adversarial Spatiotemporal Network for Cross-Domain Few-Shot Action Recognition-R2

Zan Gao et al.

Summary: This study proposes a novel adversarial network to address the cross-domain few-shot action recognition task, which integrates spatial-temporal information acquisition, few-shot learning, and video domain adaptation in a unified framework, achieving superior performance on public datasets.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2021)

Article Computer Science, Information Systems

Label consistent locally linear embedding based cross-modal hashing

Hui Zeng et al.

INFORMATION PROCESSING & MANAGEMENT (2020)

Article Computer Science, Artificial Intelligence

Image and Sentence Matching via Semantic Concepts and Order Learning

Yan Huang et al.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2020)

Article Computer Science, Information Systems

Dual-path Convolutional Image-Text Embeddings with Instance Loss

Zhedong Zheng et al.

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS (2020)

Article Computer Science, Information Systems

Global context and boundary structure-guided network for cross-modal organ segmentation

Xiaonan Guo et al.

INFORMATION PROCESSING & MANAGEMENT (2020)

Article Computer Science, Information Systems

Semantic-rebased cross-modal hashing for scalable unsupervised text-visual retrieval

Weiwei Wang et al.

INFORMATION PROCESSING & MANAGEMENT (2020)

Article Computer Science, Artificial Intelligence

Scalable Deep Hashing for Large-Scale Social Image Retrieval

Hui Cui et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2020)

Article Computer Science, Artificial Intelligence

Learning Two-Branch Neural Networks for Image-Text Matching Tasks

Liwei Wang et al.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2019)

Proceedings Paper Engineering, Electrical & Electronic

Self-paced Adversarial Training for Multimodal Few-shot Learning

Frederik Pahde et al.

2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) (2019)

Proceedings Paper Computer Science, Artificial Intelligence

Learning Cross-Modal Embeddings with Adversarial Networks for Cooking Recipes and Food Images

Hao Wang et al.

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) (2019)

Proceedings Paper Computer Science, Artificial Intelligence

End-to-end Convolutional Semantic Embeddings

Quanzeng You et al.

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2018)

Article Computer Science, Artificial Intelligence

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren et al.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2017)

Proceedings Paper Computer Science, Theory & Methods

Cross-media Retrieval by Learning Rich Semantic Embeddings of Multimedia

Mengdi Fan et al.

PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17) (2017)

Article Computer Science, Information Systems

Person Reidentification via Ranking Aggregation of Similarity Pulling and Dissimilarity Pushing

Mang Ye et al.

IEEE TRANSACTIONS ON MULTIMEDIA (2016)