Related references
Note: Only part of the references are listed.
Article
Computer Science, Theory & Methods
Salman Khan et al.
Summary: Transformer models have shown impressive results in computer vision tasks by simulating long dependencies, supporting parallel processing, and handling multi-modal data. They are widely used in visual recognition, generative modeling, multi-modal tasks, video processing, low-level vision, and three-dimensional analysis, showcasing their strengths in scalability and flexibility.
ACM COMPUTING SURVEYS
(2022)
Article
Mathematics, Applied
Xilin Xin et al.
Summary: This paper proposes a novel online mode-free integral reinforcement learning algorithm to solve multiplayer non-zero sum games. By collecting and learning subsystem information of states and inputs, and using online learning to compute corresponding N-coupled algebraic Riccati equations, the policy iterative algorithm presented in this paper can solve the coupled algebraic Riccati equations of multiplayer non-zero sum games. The effectiveness and feasibility of the design method is verified through a simulation example involving three players.
APPLIED MATHEMATICS AND COMPUTATION
(2022)
Article
Automation & Control Systems
Yiwei Wei et al.
Summary: This study introduces an Outside-in Attention mechanism to address the limitations of recurrent attention and self attention in image captioning tasks. By incorporating the advantages of both transformer and recurrent network, competitive results are achieved.
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
(2022)
Article
Computer Science, Artificial Intelligence
Qingzhong Wang et al.
Summary: This paper proposes a metric to measure the diversity of a set of image captions and explores methods for training caption models to generate diverse captions. The proposed diversity metrics show a strong correlation to human evaluation, and the experiments demonstrate the performance differences in terms of diversity and accuracy. Different techniques, such as reinforcement learning and ensemble matrix, are employed to improve the diversity and accuracy of generated captions.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2022)
Article
Computer Science, Artificial Intelligence
Zheng-Jun Zha et al.
Summary: The study focuses on image captioning and proposes a Context-Aware Visual Policy network (CAVP) to capture visual context, improving the richness and detail of image descriptions. CAVP explicitly considers previous visual attentions as context during captioning, allowing for the generation of more complex and detailed sentence and paragraph descriptions.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2022)
Review
Computer Science, Artificial Intelligence
Gaifang Luo et al.
Summary: This survey provides a comprehensive overview of image captioning methods, categorizing them based on techniques and discussing their advantages and limitations. By quantitatively comparing related state-of-the-art studies, recent trends and future directions in image captioning are determined. The ultimate goal is to serve as a tool for understanding existing literature and highlighting future directions in the field for the benefit of Computer Vision and Natural Language Processing communities.
IET IMAGE PROCESSING
(2022)
Article
Computer Science, Artificial Intelligence
Jia Huei Tan et al.
Summary: For the first time in image captioning research, an extensive comparison of various unstructured weight pruning methods is provided on three different popular image captioning architectures. A novel end-to-end weight pruning method is proposed, showing that a 80% to 95% sparse network can match or outperform its dense counterpart. Pre-trained models achieving high CIDEr scores on the MSCOCO dataset with significantly reduced model size are publicly available.
PATTERN RECOGNITION
(2022)
Article
Computer Science, Artificial Intelligence
Boyang Wan et al.
Summary: By evaluating the generalization ability of existing ICMs, we found that using a combination of low-and high-level object features, self-attention mechanism, and multi-stage language decoder could be effective ways to improve the performance of ICMs.
PATTERN RECOGNITION
(2022)
Article
Computer Science, Artificial Intelligence
Jian Han Lim et al.
Summary: This paper proposes two different embedding schemes in a recurrent neural network to protect image captioning models, which do not compromise the original performance and can withstand both removal and ambiguity attacks.
PATTERN RECOGNITION
(2022)
Article
Automation & Control Systems
Maofu Liu et al.
Summary: Automatic image captioning is a complex research issue in the field of artificial intelligence, involving computer vision and natural language processing. Despite the remarkable performance of the neural image caption (NIC) model, there are still challenges in achieving accurate and diverse image captions, as well as overcoming the deviation and monotony issues. The distinction between Chinese and English in syntax and semantics necessitates the development of specialized Chinese image caption generation methods. Our NICVATP2L model, incorporating visual attention and topic modeling, effectively addresses these challenges and outperforms existing NIC models.
IEEE TRANSACTIONS ON CYBERNETICS
(2022)
Article
Computer Science, Artificial Intelligence
Zuopeng Yang et al.
Summary: This paper introduces a Human-Centric Captioning Model focused on describing human behavior in images. By creating a specialized COCO dataset and developing the HCCM model, the study achieves state-of-the-art performance in understanding and describing diverse human activities.
PATTERN RECOGNITION
(2022)
Article
Computer Science, Artificial Intelligence
Nora Horanyi et al.
Summary: This study proposes a novel optimization framework that optimizes image cropping parameters based on user description and aesthetics. Instead of training a separate network, pre-trained networks on image captioning and aesthetic tasks are repurposed. The framework employs three strategies to ensure stable optimization and produces crops that align with user descriptions and aesthetics.
PATTERN RECOGNITION
(2022)
Article
Engineering, Multidisciplinary
Hongfeng Tao et al.
Summary: With the rapid development of industrial informatization and deep learning technology, modern data-driven fault diagnosis methods based on deep learning have attracted attention from the industry. However, the scarcity of fault samples in actual industrial environments and cross-domain problems between different devices limit the development of these methods. This paper proposes a model unknown matching network model for fault diagnosis with few samples, which combines parameter optimization and feature metric to address these limitations and achieves promising results in experiments.
MEASUREMENT SCIENCE AND TECHNOLOGY
(2022)
Article
Computer Science, Artificial Intelligence
Chi Wang et al.
Summary: This paper proposes an improved Geometry Attention Transformer (GAT) framework for image captioning, which incorporates geometry gate controlled self-attention refiner and position-LSTMs to enhance the performance. Experimental results demonstrate that the GAT outperforms current state-of-the-art image captioning models.
EXPERT SYSTEMS WITH APPLICATIONS
(2022)
Article
Computer Science, Information Systems
Litao Yu et al.
Summary: This paper proposes a method of applying dual attention on pyramid image feature maps to improve the quality of generated sentences from images. The method achieves impressive results on multiple datasets and has a highly modular nature, making it easily applicable to other image captioning modules.
IEEE TRANSACTIONS ON MULTIMEDIA
(2022)
Article
Computer Science, Artificial Intelligence
Jing Zhang et al.
Summary: A visual enhanced gLSTM model is proposed for image caption generation in this paper, which utilizes visual features from the region of interest in images as guiding information to improve the accuracy of image captions. Experimental results show that the proposed method outperforms baseline gLSTM algorithm and other popular image captioning methods in terms of caption accuracy.
EXPERT SYSTEMS WITH APPLICATIONS
(2021)
Article
Computer Science, Artificial Intelligence
Junzhong Ji et al.
Summary: A novel divergent-convergent attention (DCA) model is proposed to address the issues in current attention-based image captioning methods. By utilizing multi-perspective inputs and adaptive attention merging, the model achieves more precise focus on local image regions and generates more descriptive sentences. The interaction between visual and semantic components contributes to the model's superior performance on the MS COCO dataset.
PATTERN RECOGNITION
(2021)
Article
Computer Science, Artificial Intelligence
Yu Zhang et al.
Summary: This paper applies the Transformer model to image captioning tasks and improves its performance in two aspects by adding a KL divergence term and leveraging knowledge graphs. Experimental results on benchmark datasets show the effectiveness of the proposed method.
PATTERN RECOGNITION LETTERS
(2021)
Article
Computer Science, Information Systems
Xian Zhong et al.
Summary: The proposed image captioning scheme based on adaptive spatial information attention (ASIA) effectively extracts spatial information of salient objects, utilizes different techniques in encoding and decoding stages, improving captioning performance according to extensive experiments on two datasets.
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION
(2021)
Article
Computer Science, Information Systems
Weitao Jiang et al.
Summary: The paper proposes a novel Multi-Gate Attention Network (MGAN) for image captioning, integrating Multi-Gate Attention (MGA) blocks to enhance feature representation and capture relevant information. Experiments show that MGAN outperforms most state-of-the-art methods on the MS COCO dataset, demonstrating its generalizability when combined with other methods incorporating MGA blocks.
Article
Computer Science, Information Systems
Jie Wu et al.
Summary: In the field of image captioning, a novel global-local discriminative objective is proposed to generate fine-grained descriptive captions. The method outperforms baseline methods on the widely used MS-COCO dataset and competes with existing leading approaches.
IEEE TRANSACTIONS ON MULTIMEDIA
(2021)
Article
Computer Science, Information Systems
Ji Zhang et al.
Summary: The paper proposes an integrated image captioning method that incorporates part of speech information, using a part of speech prediction network within an encoder-decoder framework, and multi-task learning to generate captions with more accurate visual information and better compliance with language habits and grammar rules.
IEEE TRANSACTIONS ON MULTIMEDIA
(2021)
Article
Computer Science, Artificial Intelligence
Junbo Wang et al.
PATTERN RECOGNITION
(2020)
Article
Computer Science, Information Systems
Chao Li et al.
INFORMATION PROCESSING & MANAGEMENT
(2020)
Article
Computer Science, Hardware & Architecture
Heng Song et al.
COMPUTERS & ELECTRICAL ENGINEERING
(2020)
Article
Engineering, Electrical & Electronic
Jun Yu et al.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
(2020)
Article
Computer Science, Artificial Intelligence
Min Yang et al.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2020)
Article
Computer Science, Artificial Intelligence
Wenjie Cai et al.
Article
Computer Science, Artificial Intelligence
Lian Zhou et al.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2020)
Article
Computer Science, Artificial Intelligence
Yiqing Huang et al.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2020)
Article
Computer Science, Interdisciplinary Applications
P. R. Devi et al.
PATTERN RECOGNITION AND IMAGE ANALYSIS
(2020)
Article
Computer Science, Software Engineering
Xiaoxiao Liu et al.
Article
Computer Science, Information Systems
Min Yang et al.
IEEE TRANSACTIONS ON MULTIMEDIA
(2019)
Article
Computer Science, Artificial Intelligence
Niange Yu et al.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2019)
Article
Computer Science, Artificial Intelligence
Xinyu Xiao et al.
PATTERN RECOGNITION
(2019)
Article
Computer Science, Artificial Intelligence
Guiguang Ding et al.
COGNITIVE COMPUTATION
(2019)
Article
Chemistry, Multidisciplinary
Jiangyun Li et al.
APPLIED SCIENCES-BASEL
(2019)
Article
Computer Science, Artificial Intelligence
Senmao Ye et al.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2018)
Article
Computer Science, Information Systems
Linghui Li et al.
IEEE TRANSACTIONS ON MULTIMEDIA
(2018)
Article
Computer Science, Artificial Intelligence
Shuang Bai et al.
Article
Computer Science, Artificial Intelligence
Pengjie Tang et al.
Article
Chemistry, Multidisciplinary
Xinxin Zhu et al.
APPLIED SCIENCES-BASEL
(2018)
Article
Computer Science, Artificial Intelligence
Xinxin Zhu et al.
Article
Computer Science, Artificial Intelligence
Oriol Vinyals et al.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2017)
Article
Computer Science, Artificial Intelligence
Vicente Ordonez et al.
INTERNATIONAL JOURNAL OF COMPUTER VISION
(2016)
Article
Automation & Control Systems
Vladimir Stojanovic et al.
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL
(2016)
Article
Computer Science, Artificial Intelligence
Girish Kulkarni et al.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2013)
Article
Computer Science, Artificial Intelligence
Micah Hodosh et al.
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
(2013)
Article
Computer Science, Artificial Intelligence
Pedro F. Felzenszwalb et al.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2010)
Article
Computer Science, Artificial Intelligence
A Graves et al.