4.6 Article

Multi-scale motivated neural network for image-text matching

相关参考文献

注意:仅列出部分参考文献,下载原文获取全部文献信息。
Article Computer Science, Artificial Intelligence

A topic-based multi-channel attention model under hybrid mode for image caption

Kui Qian et al.

Summary: This study introduces a topic-based multi-channel attention model (TMA) for image caption generation to address the decoupling issue between visual spatial feature attention and semantic decoder. By preprocessing caption references, designing a semantic perception network, proposing a multi-channel attention fusion mechanism, and training TMA with a multi-task loss function, the model shows better evaluation performance with topic-focused attention compared to state-of-the-art methods.

NEURAL COMPUTING & APPLICATIONS (2022)

Article Computer Science, Artificial Intelligence

Intra-class low-rank regularization for supervised and semi-supervised cross-modal retrieval

Peipei Kang et al.

Summary: This paper proposes two deep models based on intra-class low-rank regularization for supervised and semi-supervised cross-modal retrieval, denoted as ILCMR and Semi-ILCMR. ILCMR integrates image and text networks to learn a common feature space by imposing three regularization terms, improving supervised cross-modal retrieval performance. Semi-ILCMR introduces a low-rank constraint in the semi-supervised regime, effectively enhancing semi-supervised cross-modal retrieval performance.

APPLIED INTELLIGENCE (2022)

Article Computer Science, Information Systems

Image captioning improved visual question answering

Himanshu Sharma et al.

Summary: A novel VQA model based on image captioning is proposed in this paper, which integrates knowledge learned from the image captioning task and transfers it to the VQA task, resulting in improved answer generation accuracy on various VQA datasets.

MULTIMEDIA TOOLS AND APPLICATIONS (2022)

Article Computer Science, Information Systems

Cross-modal alignment with graph reasoning for image-text retrieval

Zheng Cui et al.

Summary: Image-text retrieval is a challenging task due to the heterogeneous nature of image and text data. To address this, we propose a Cross-modal Alignment with Graph Reasoning (CAGR) model that learns optimized cross-modal features using graph reasoning and attention mechanism, and computes similarity scores between image and text using a fine-grained alignment approach. Extensive experiments demonstrate the effectiveness of our model.

MULTIMEDIA TOOLS AND APPLICATIONS (2022)

Article Computer Science, Artificial Intelligence

Text-instance graph: Exploring the relational semantics for text-based visual question answering

Xiangpeng Li et al.

Summary: This study addresses the TextVQA problem and proposes a novel Text-Instance Graph (TIG) network to tackle the challenge. TIG models relationships between objects by building an OCR-OBJ graph and introduces a dynamic OCR-OBJ graph network to handle complex logic questions. Experimental results demonstrate the superior effectiveness of the proposed method compared to existing approaches.

PATTERN RECOGNITION (2022)

Article Computer Science, Information Systems

Cross-modal Graph Matching Network for Image-text Retrieval

Yuhao Cheng et al.

Summary: Image-text retrieval is a fundamental task in cross-modal research. Existing methods can be classified into independent representation matching and cross-interaction matching. This article proposes a method called CGMN, which explores both intra- and inter-relations without introducing network interaction. The experiments show that CGMN outperforms state-of-the-art methods in image retrieval and is more efficient than interactive matching methods.

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS (2022)

Article Computer Science, Information Systems

Multi-level similarity learning for image-text retrieval

Wen-Hui Li et al.

Summary: This paper proposes a multi-level representation learning method to improve the quality of image-text retrieval task by utilizing semantic-level, structural-level, and contextual-level information. The experiments demonstrate the superiority of this method on two commonly used datasets.

INFORMATION PROCESSING & MANAGEMENT (2021)

Article Computer Science, Artificial Intelligence

Scalable multi-label canonical correlation analysis for cross-modal retrieval

Xin Shu et al.

Summary: In this paper, a novel framework is proposed to integrate semantic correlation and feature correlation for cross-modal retrieval. By using semantic transformation, the model avoids explicitly computing the covariance matrix, which leads to a huge saving of computational cost. Experimental results demonstrate the accuracy and efficiency of the proposed method on three multi-label datasets.

PATTERN RECOGNITION (2021)

Proceedings Paper Computer Science, Artificial Intelligence

VSR plus plus : Improving Visual Semantic Reasoning for Fine-Grained Image-Text Matching

Hui Yuan et al.

Summary: The Improved Visual Semantic Reasoning model (VSR++) addresses the challenges in fine-grained image-text matching by jointly modeling global alignment and local correspondence. With a suitable learning strategy to balance their importance, the model achieves state-of-the-art performance on two benchmark datasets by distinguishing image regions and text words at a fine-grained level.

2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) (2021)

Article Computer Science, Artificial Intelligence

Vocabulary-Wide Credit Assignment for Training Image Captioning Models

Han Liu et al.

Summary: The study proposes a new credit assignment method in reinforcement learning algorithms, called vocabulary-wide credit assignment, which assigns appropriate credits to each word in the vocabulary at each generation step. This method has been applied to training image captioning models, leading to better experimental results.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2021)

Article Computer Science, Artificial Intelligence

Bidirectional image-sentence retrieval by local and global deep matching

Lin Ma et al.

NEUROCOMPUTING (2019)

Article Computer Science, Artificial Intelligence

Bi-Directional Spatial-Semantic Attention Networks for Image-Text Matching

Feiran Huang et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2019)

Article Computer Science, Artificial Intelligence

CycleMatch: A cycle-consistent embedding network for image-text matching

Yu Liu et al.

PATTERN RECOGNITION (2019)

Article Computer Science, Artificial Intelligence

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Ranjay Krishna et al.

INTERNATIONAL JOURNAL OF COMPUTER VISION (2017)

Article Computer Science, Artificial Intelligence

Global Contrast Based Salient Region Detection

Ming-Ming Cheng et al.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2015)