相关参考文献
注意:仅列出部分参考文献,下载原文获取全部文献信息。
Article
Computer Science, Artificial Intelligence
Kui Qian et al.
Summary: This study introduces a topic-based multi-channel attention model (TMA) for image caption generation to address the decoupling issue between visual spatial feature attention and semantic decoder. By preprocessing caption references, designing a semantic perception network, proposing a multi-channel attention fusion mechanism, and training TMA with a multi-task loss function, the model shows better evaluation performance with topic-focused attention compared to state-of-the-art methods.
NEURAL COMPUTING & APPLICATIONS
(2022)
Article
Computer Science, Artificial Intelligence
Peipei Kang et al.
Summary: This paper proposes two deep models based on intra-class low-rank regularization for supervised and semi-supervised cross-modal retrieval, denoted as ILCMR and Semi-ILCMR. ILCMR integrates image and text networks to learn a common feature space by imposing three regularization terms, improving supervised cross-modal retrieval performance. Semi-ILCMR introduces a low-rank constraint in the semi-supervised regime, effectively enhancing semi-supervised cross-modal retrieval performance.
APPLIED INTELLIGENCE
(2022)
Article
Computer Science, Information Systems
Himanshu Sharma et al.
Summary: A novel VQA model based on image captioning is proposed in this paper, which integrates knowledge learned from the image captioning task and transfers it to the VQA task, resulting in improved answer generation accuracy on various VQA datasets.
MULTIMEDIA TOOLS AND APPLICATIONS
(2022)
Article
Computer Science, Information Systems
Zheng Cui et al.
Summary: Image-text retrieval is a challenging task due to the heterogeneous nature of image and text data. To address this, we propose a Cross-modal Alignment with Graph Reasoning (CAGR) model that learns optimized cross-modal features using graph reasoning and attention mechanism, and computes similarity scores between image and text using a fine-grained alignment approach. Extensive experiments demonstrate the effectiveness of our model.
MULTIMEDIA TOOLS AND APPLICATIONS
(2022)
Article
Computer Science, Artificial Intelligence
Xiangpeng Li et al.
Summary: This study addresses the TextVQA problem and proposes a novel Text-Instance Graph (TIG) network to tackle the challenge. TIG models relationships between objects by building an OCR-OBJ graph and introduces a dynamic OCR-OBJ graph network to handle complex logic questions. Experimental results demonstrate the superior effectiveness of the proposed method compared to existing approaches.
PATTERN RECOGNITION
(2022)
Article
Computer Science, Information Systems
Yuhao Cheng et al.
Summary: Image-text retrieval is a fundamental task in cross-modal research. Existing methods can be classified into independent representation matching and cross-interaction matching. This article proposes a method called CGMN, which explores both intra- and inter-relations without introducing network interaction. The experiments show that CGMN outperforms state-of-the-art methods in image retrieval and is more efficient than interactive matching methods.
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS
(2022)
Article
Computer Science, Information Systems
Wen-Hui Li et al.
Summary: This paper proposes a multi-level representation learning method to improve the quality of image-text retrieval task by utilizing semantic-level, structural-level, and contextual-level information. The experiments demonstrate the superiority of this method on two commonly used datasets.
INFORMATION PROCESSING & MANAGEMENT
(2021)
Article
Computer Science, Artificial Intelligence
Xin Shu et al.
Summary: In this paper, a novel framework is proposed to integrate semantic correlation and feature correlation for cross-modal retrieval. By using semantic transformation, the model avoids explicitly computing the covariance matrix, which leads to a huge saving of computational cost. Experimental results demonstrate the accuracy and efficiency of the proposed method on three multi-label datasets.
PATTERN RECOGNITION
(2021)
Proceedings Paper
Computer Science, Artificial Intelligence
Hui Yuan et al.
Summary: The Improved Visual Semantic Reasoning model (VSR++) addresses the challenges in fine-grained image-text matching by jointly modeling global alignment and local correspondence. With a suitable learning strategy to balance their importance, the model achieves state-of-the-art performance on two benchmark datasets by distinguishing image regions and text words at a fine-grained level.
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR)
(2021)
Article
Computer Science, Artificial Intelligence
Han Liu et al.
Summary: The study proposes a new credit assignment method in reinforcement learning algorithms, called vocabulary-wide credit assignment, which assigns appropriate credits to each word in the vocabulary at each generation step. This method has been applied to training image captioning models, leading to better experimental results.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2021)
Article
Computer Science, Artificial Intelligence
Lin Ma et al.
Article
Computer Science, Artificial Intelligence
Feiran Huang et al.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2019)
Article
Computer Science, Artificial Intelligence
Yu Liu et al.
PATTERN RECOGNITION
(2019)
Article
Computer Science, Artificial Intelligence
Ranjay Krishna et al.
INTERNATIONAL JOURNAL OF COMPUTER VISION
(2017)
Article
Computer Science, Artificial Intelligence
Ming-Ming Cheng et al.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2015)