4.7 Article

Transformer-based local-global guidance for image captioning

Related references

Note: Only part of the references are listed.
Article Computer Science, Theory & Methods

Transformers in Vision: A Survey

Salman Khan et al.

Summary: Transformer models have shown impressive results in computer vision tasks by simulating long dependencies, supporting parallel processing, and handling multi-modal data. They are widely used in visual recognition, generative modeling, multi-modal tasks, video processing, low-level vision, and three-dimensional analysis, showcasing their strengths in scalability and flexibility.

ACM COMPUTING SURVEYS (2022)

Article Mathematics, Applied

Online reinforcement learning multiplayer non-zero sum games of continuous-time Markov jump linear systems

Xilin Xin et al.

Summary: This paper proposes a novel online mode-free integral reinforcement learning algorithm to solve multiplayer non-zero sum games. By collecting and learning subsystem information of states and inputs, and using online learning to compute corresponding N-coupled algebraic Riccati equations, the policy iterative algorithm presented in this paper can solve the coupled algebraic Riccati equations of multiplayer non-zero sum games. The effectiveness and feasibility of the design method is verified through a simulation example involving three players.

APPLIED MATHEMATICS AND COMPUTATION (2022)

Article Automation & Control Systems

Sequential Transformer via an Outside-In Attention for image captioning

Yiwei Wei et al.

Summary: This study introduces an Outside-in Attention mechanism to address the limitations of recurrent attention and self attention in image captioning tasks. By incorporating the advantages of both transformer and recurrent network, competitive results are achieved.

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE (2022)

Article Computer Science, Artificial Intelligence

On Diversity in Image Captioning: Metrics and Methods

Qingzhong Wang et al.

Summary: This paper proposes a metric to measure the diversity of a set of image captions and explores methods for training caption models to generate diverse captions. The proposed diversity metrics show a strong correlation to human evaluation, and the experiments demonstrate the performance differences in terms of diversity and accuracy. Different techniques, such as reinforcement learning and ensemble matrix, are employed to improve the diversity and accuracy of generated captions.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2022)

Article Computer Science, Artificial Intelligence

Context-Aware Visual Policy Network for Fine-Grained Image Captioning

Zheng-Jun Zha et al.

Summary: The study focuses on image captioning and proposes a Context-Aware Visual Policy network (CAVP) to capture visual context, improving the richness and detail of image descriptions. CAVP explicitly considers previous visual attentions as context during captioning, allowing for the generation of more complex and detailed sentence and paragraph descriptions.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2022)

Review Computer Science, Artificial Intelligence

A thorough review of models, evaluation metrics, and datasets on image captioning

Gaifang Luo et al.

Summary: This survey provides a comprehensive overview of image captioning methods, categorizing them based on techniques and discussing their advantages and limitations. By quantitatively comparing related state-of-the-art studies, recent trends and future directions in image captioning are determined. The ultimate goal is to serve as a tool for understanding existing literature and highlighting future directions in the field for the benefit of Computer Vision and Natural Language Processing communities.

IET IMAGE PROCESSING (2022)

Article Computer Science, Artificial Intelligence

End-to-End Supermask Pruning: Learning to Prune Image Captioning Models

Jia Huei Tan et al.

Summary: For the first time in image captioning research, an extensive comparison of various unstructured weight pruning methods is provided on three different popular image captioning architectures. A novel end-to-end weight pruning method is proposed, showing that a 80% to 95% sparse network can match or outperform its dense counterpart. Pre-trained models achieving high CIDEr scores on the MSCOCO dataset with significantly reduced model size are publicly available.

PATTERN RECOGNITION (2022)

Article Computer Science, Artificial Intelligence

Revisiting image captioning via maximum discrepancy competition

Boyang Wan et al.

Summary: By evaluating the generalization ability of existing ICMs, we found that using a combination of low-and high-level object features, self-attention mechanism, and multi-stage language decoder could be effective ways to improve the performance of ICMs.

PATTERN RECOGNITION (2022)

Article Computer Science, Artificial Intelligence

Protect, show, attend and tell: Empowering image captioning models with ownership protection

Jian Han Lim et al.

Summary: This paper proposes two different embedding schemes in a recurrent neural network to protect image captioning models, which do not compromise the original performance and can withstand both removal and ambiguity attacks.

PATTERN RECOGNITION (2022)

Article Automation & Control Systems

Chinese Image Caption Generation via Visual Attention and Topic Modeling

Maofu Liu et al.

Summary: Automatic image captioning is a complex research issue in the field of artificial intelligence, involving computer vision and natural language processing. Despite the remarkable performance of the neural image caption (NIC) model, there are still challenges in achieving accurate and diverse image captions, as well as overcoming the deviation and monotony issues. The distinction between Chinese and English in syntax and semantics necessitates the development of specialized Chinese image caption generation methods. Our NICVATP2L model, incorporating visual attention and topic modeling, effectively addresses these challenges and outperforms existing NIC models.

IEEE TRANSACTIONS ON CYBERNETICS (2022)

Article Computer Science, Artificial Intelligence

Human-Centric Image Captioning

Zuopeng Yang et al.

Summary: This paper introduces a Human-Centric Captioning Model focused on describing human behavior in images. By creating a specialized COCO dataset and developing the HCCM model, the study achieves state-of-the-art performance in understanding and describing diverse human activities.

PATTERN RECOGNITION (2022)

Article Computer Science, Artificial Intelligence

Repurposing existing deep networks for caption and aesthetic-guided image cropping

Nora Horanyi et al.

Summary: This study proposes a novel optimization framework that optimizes image cropping parameters based on user description and aesthetics. Instead of training a separate network, pre-trained networks on image captioning and aesthetic tasks are repurposed. The framework employs three strategies to ensure stable optimization and produces crops that align with user descriptions and aesthetics.

PATTERN RECOGNITION (2022)

Article Engineering, Multidisciplinary

Few shot cross equipment fault diagnosis method based on parameter optimization and feature mertic

Hongfeng Tao et al.

Summary: With the rapid development of industrial informatization and deep learning technology, modern data-driven fault diagnosis methods based on deep learning have attracted attention from the industry. However, the scarcity of fault samples in actual industrial environments and cross-domain problems between different devices limit the development of these methods. This paper proposes a model unknown matching network model for fault diagnosis with few samples, which combines parameter optimization and feature metric to address these limitations and achieves promising results in experiments.

MEASUREMENT SCIENCE AND TECHNOLOGY (2022)

Article Computer Science, Artificial Intelligence

Geometry Attention Transformer with position-aware LSTMs for image

Chi Wang et al.

Summary: This paper proposes an improved Geometry Attention Transformer (GAT) framework for image captioning, which incorporates geometry gate controlled self-attention refiner and position-LSTMs to enhance the performance. Experimental results demonstrate that the GAT outperforms current state-of-the-art image captioning models.

EXPERT SYSTEMS WITH APPLICATIONS (2022)

Article Computer Science, Information Systems

Dual Attention on Pyramid Feature Maps for Image Captioning

Litao Yu et al.

Summary: This paper proposes a method of applying dual attention on pyramid image feature maps to improve the quality of generated sentences from images. The method achieves impressive results on multiple datasets and has a highly modular nature, making it easily applicable to other image captioning modules.

IEEE TRANSACTIONS ON MULTIMEDIA (2022)

Article Computer Science, Artificial Intelligence

Visual enhanced gLSTM for image captioning

Jing Zhang et al.

Summary: A visual enhanced gLSTM model is proposed for image caption generation in this paper, which utilizes visual features from the region of interest in images as guiding information to improve the accuracy of image captions. Experimental results show that the proposed method outperforms baseline gLSTM algorithm and other popular image captioning methods in terms of caption accuracy.

EXPERT SYSTEMS WITH APPLICATIONS (2021)

Article Computer Science, Artificial Intelligence

Divergent-convergent attention for image captioning

Junzhong Ji et al.

Summary: A novel divergent-convergent attention (DCA) model is proposed to address the issues in current attention-based image captioning methods. By utilizing multi-perspective inputs and adaptive attention merging, the model achieves more precise focus on local image regions and generates more descriptive sentences. The interaction between visual and semantic components contributes to the model's superior performance on the MS COCO dataset.

PATTERN RECOGNITION (2021)

Article Computer Science, Artificial Intelligence

Image captioning with transformer and knowledge graph

Yu Zhang et al.

Summary: This paper applies the Transformer model to image captioning tasks and improves its performance in two aspects by adding a KL divergence term and leveraging knowledge graphs. Experimental results on benchmark datasets show the effectiveness of the proposed method.

PATTERN RECOGNITION LETTERS (2021)

Article Computer Science, Information Systems

Attention-guided image captioning with adaptive global and local feature fusion

Xian Zhong et al.

Summary: The proposed image captioning scheme based on adaptive spatial information attention (ASIA) effectively extracts spatial information of salient objects, utilizes different techniques in encoding and decoding stages, improving captioning performance according to extensive experiments on two datasets.

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION (2021)

Article Computer Science, Information Systems

Multi-Gate Attention Network for Image Captioning

Weitao Jiang et al.

Summary: The paper proposes a novel Multi-Gate Attention Network (MGAN) for image captioning, integrating Multi-Gate Attention (MGA) blocks to enhance feature representation and capture relevant information. Experiments show that MGAN outperforms most state-of-the-art methods on the MS COCO dataset, demonstrating its generalizability when combined with other methods incorporating MGA blocks.

IEEE ACCESS (2021)

Article Computer Science, Information Systems

Fine-Grained Image Captioning With Global-Local Discriminative Objective

Jie Wu et al.

Summary: In the field of image captioning, a novel global-local discriminative objective is proposed to generate fine-grained descriptive captions. The method outperforms baseline methods on the widely used MS-COCO dataset and competes with existing leading approaches.

IEEE TRANSACTIONS ON MULTIMEDIA (2021)

Article Computer Science, Information Systems

Integrating Part of Speech Guidance for Image Captioning

Ji Zhang et al.

Summary: The paper proposes an integrated image captioning method that incorporates part of speech information, using a part of speech prediction network within an encoder-decoder framework, and multi-task learning to generate captions with more accurate visual information and better compliance with language habits and grammar rules.

IEEE TRANSACTIONS ON MULTIMEDIA (2021)

Article Computer Science, Artificial Intelligence

Learning visual relationship and context-aware attention for image captioning

Junbo Wang et al.

PATTERN RECOGNITION (2020)

Article Computer Science, Information Systems

Exploring temporal representations by leveraging attention-based bidirectional LSTM-RNNs for multi-modal emotion recognition

Chao Li et al.

INFORMATION PROCESSING & MANAGEMENT (2020)

Article Computer Science, Hardware & Architecture

avtmNet:Adaptive Visual-Text Merging Network for Image Captioning

Heng Song et al.

COMPUTERS & ELECTRICAL ENGINEERING (2020)

Article Engineering, Electrical & Electronic

Multimodal Transformer With Multi-View Visual Representation for Image Captioning

Jun Yu et al.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2020)

Article Computer Science, Artificial Intelligence

An Ensemble of Generation- and Retrieval-Based Image Captioning With Dual Generator Generative Adversarial Network

Min Yang et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2020)

Article Computer Science, Artificial Intelligence

Image captioning with semantic-enhanced features and extremely hard negative examples

Wenjie Cai et al.

NEUROCOMPUTING (2020)

Article Computer Science, Artificial Intelligence

Re-Caption: Saliency-Enhanced Image Captioning Through Two-Phase Learning

Lian Zhou et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2020)

Article Computer Science, Artificial Intelligence

Image Captioning With End-to-End Attribute Detection and Subsequent Attributes Prediction

Yiqing Huang et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2020)

Article Computer Science, Interdisciplinary Applications

Image Captioning using Reinforcement Learning with BLUDEr Optimization

P. R. Devi et al.

PATTERN RECOGNITION AND IMAGE ANALYSIS (2020)

Article Computer Science, Software Engineering

A survey on deep neural network-based image captioning

Xiaoxiao Liu et al.

VISUAL COMPUTER (2019)

Article Computer Science, Information Systems

Multitask Learning for Cross-Domain Image Captioning

Min Yang et al.

IEEE TRANSACTIONS ON MULTIMEDIA (2019)

Article Computer Science, Artificial Intelligence

Topic-Oriented Image Captioning Based on Order-Embedding

Niange Yu et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2019)

Article Computer Science, Artificial Intelligence

Dense semantic embedding network for image captioning

Xinyu Xiao et al.

PATTERN RECOGNITION (2019)

Article Computer Science, Artificial Intelligence

Neural Image Caption Generation with Weighted Training and Reference

Guiguang Ding et al.

COGNITIVE COMPUTATION (2019)

Article Chemistry, Multidisciplinary

Boosted Transformer for Image Captioning

Jiangyun Li et al.

APPLIED SCIENCES-BASEL (2019)

Article Computer Science, Artificial Intelligence

Attentive Linear Transformation for Image Captioning

Senmao Ye et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2018)

Article Computer Science, Information Systems

GLA: Global-Local Attention for Image Description

Linghui Li et al.

IEEE TRANSACTIONS ON MULTIMEDIA (2018)

Article Computer Science, Artificial Intelligence

A survey on automatic image caption generation

Shuang Bai et al.

NEUROCOMPUTING (2018)

Article Computer Science, Artificial Intelligence

Deep sequential fusion LSTM network for image description

Pengjie Tang et al.

NEUROCOMPUTING (2018)

Article Chemistry, Multidisciplinary

Captioning Transformer with Stacked Attention Modules

Xinxin Zhu et al.

APPLIED SCIENCES-BASEL (2018)

Article Computer Science, Artificial Intelligence

Image captioning with triple-attention and stack parallel LSTM

Xinxin Zhu et al.

NEUROCOMPUTING (2018)

Article Computer Science, Artificial Intelligence

Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge

Oriol Vinyals et al.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2017)

Article Computer Science, Artificial Intelligence

Large Scale Retrieval and Generation of Image Descriptions

Vicente Ordonez et al.

INTERNATIONAL JOURNAL OF COMPUTER VISION (2016)

Article Automation & Control Systems

Joint state and parameter robust estimation of stochastic nonlinear systems

Vladimir Stojanovic et al.

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL (2016)

Article Computer Science, Artificial Intelligence

BabyTalk: Understanding and Generating Simple Image Descriptions

Girish Kulkarni et al.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2013)

Article Computer Science, Artificial Intelligence

Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics

Micah Hodosh et al.

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH (2013)

Article Computer Science, Artificial Intelligence

Object Detection with Discriminatively Trained Part-Based Models

Pedro F. Felzenszwalb et al.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2010)

Article Computer Science, Artificial Intelligence

Framewise phoneme classification with bidirectional LSTM and other neural network architectures

A Graves et al.

NEURAL NETWORKS (2005)