4.5 Article

A Framework for Image Captioning Based on Relation Network and Multilevel Attention Mechanism

相关参考文献

注意:仅列出部分参考文献,下载原文获取全部文献信息。
Article Engineering, Electrical & Electronic

Task-Adaptive Attention for Image Captioning

Chenggang Yan et al.

Summary: This paper proposes a Task-Adaptive Attention module for image captioning, which learns non-visual clues to address the misleading issue in attention models during word generation. The module is further enhanced with diversity regularization to improve expression ability. Experimental results on MSCOCO captioning dataset show that the module improves the performance of a vanilla Transformer-based image captioning model.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2022)

Article Computer Science, Artificial Intelligence

An Improved Attention and Hybrid Optimization Technique for Visual Question Answering

Himanshu Sharma et al.

Summary: This paper proposed a new VQA model that uses effective image features and graph neural network to answer questions related to foreground object and background region, and generate image captions based on visual relationships. The performance of the model is improved by combining two attention modules and a hybrid algorithm.

NEURAL PROCESSING LETTERS (2022)

Article Computer Science, Artificial Intelligence

A New Attention-Based LSTM for Image Captioning

Fen Xiao et al.

Summary: This paper proposes an attentional LSTM (ALSTM) for image captioning. Unlike traditional LSTM, ALSTM can refine input vector by learning from network hidden states and sequential context information. ALSTM is used as the decoder in some classical frameworks and demonstrates how to obtain effective visual/context attention. Extensive evaluations show the superiority of ALSTM in generating high-quality image descriptions.

NEURAL PROCESSING LETTERS (2022)

Article Computer Science, Artificial Intelligence

Data-efficient image captioning of fine art paintings via virtual-real semantic alignment training

Yue Lu et al.

Summary: This paper presents an image captioning method for fine art paintings and proposes a virtual-real semantic alignment training process to address the challenges in painting captioning. Evaluation results show the effectiveness of the method in two data-hungry scenarios.

NEUROCOMPUTING (2022)

Article Computer Science, Artificial Intelligence

A framework for visual question answering with the integration of scene-text using PHOCs and fisher vectors

Himanshu Sharma et al.

Summary: This study proposes a novel VQA model based on text cues in images to enhance accuracy, utilizing PHOC, Fisher vector representation, and transformer model with dynamic pointer networks for answer decoding, showing effectiveness over existing models on popular datasets.

EXPERT SYSTEMS WITH APPLICATIONS (2022)

Article Engineering, Electrical & Electronic

Region-Aware Image Captioning via Interaction Learning

An-An Liu et al.

Summary: Image captioning, one of the primary goals in computer vision, aims to automatically generate natural descriptions for images. This paper proposes a region-aware interaction learning method to explicitly capture the semantic correlations between regions and objects for word inference, effectively capturing contextual information.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2022)

Article Engineering, Electrical & Electronic

High-Order Interaction Learning for Image Captioning

Yanhui Wang et al.

Summary: Image captioning aims to generate sentence descriptions of images by learning interactions among objects and relationships, leveraging high-order interactions is expected to benefit image captioning and reasoning.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2022)

Article Computer Science, Artificial Intelligence

Leveraging Knowledge Graphs and Deep Learning for automatic art analysis

Giovanna Castellano et al.

KNOWLEDGE-BASED SYSTEMS (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Injecting Semantic Concepts into End-to-End Image Captioning

Zhiyuan Fang et al.

Summary: In recent years, significant progress has been made in developing better image captioning models, with some models using grid representations for more flexible training and faster inference. This paper proposes a pure vision transformer-based image captioning model, and introduces a concept token network to predict semantic concepts, achieving competitive performance on challenging image captioning datasets.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) (2022)

Proceedings Paper Computer Science, Theory & Methods

The Unreasonable Effectiveness of CLIP Features for Image Captioning: An Experimental Analysis

Manuele Barraco et al.

Summary: This paper examines the advantage of using CLIP as a visual encoder in image captioning. Through extensive experiments, it demonstrates that CLIP outperforms commonly used visual encoders in various architectures and evaluation protocols, including both traditional captioning performance and zero-shot transfer.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022 (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Explainability for Medical Image Captioning

Djamila Beddiar et al.

Summary: Medical image captioning is the process of generating clinically significant descriptions for medical images. Automatic captioning of medical images is beneficial for medical experts, but current methods still provide weak and incorrect descriptions. To address this, the paper proposes an explainable module that interprets the correspondence between visual features and semantic features to offer a sound interpretation of the generated captions.

2022 ELEVENTH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA) (2022)

Article Computer Science, Information Systems

Bi-Directional Co-Attention Network for Image Captioning

Weitao Jiang et al.

Summary: In this article, a Bidirectional Co-Attention Network (BCAN) is proposed to improve the accuracy of image captioning by combining multiple visual features and adopting the Multivariate Residual Module (MRM) for multimodal integration. The BCAN can obtain complementary information from multiple visual features via the bi-directional co-attention strategy, and integrate multimodal information via the improved multivariate residual strategy, achieving superior performance compared to existing methods on benchmark datasets.

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS (2021)

Article Computer Science, Artificial Intelligence

A survey of methods, datasets and evaluation metrics for visual question answering

Himanshu Sharma et al.

Summary: Visual Question Answering (VQA) is a challenging research problem that combines computer vision and natural language processing. Researchers need to leverage common sense reasoning, image information, and world knowledge to provide accurate answers. In addition to traditional models, new VQA models and evaluation metrics are continuously being developed to improve performance.

IMAGE AND VISION COMPUTING (2021)

Article Computer Science, Artificial Intelligence

Visual question answering model based on graph neural network and contextual attention

Himanshu Sharma et al.

Summary: Visual Question Answering (VQA) is an emerging research area in computer vision and natural language processing, aiming to predict answers to natural questions related to images. However, current VQA approaches often overlook the relationship and reasoning among regions of interest. The proposed VQA model introduced in this paper considers previously attended visual content, leading to improved accuracy in answer prediction.

IMAGE AND VISION COMPUTING (2021)

Article Computer Science, Information Systems

RTFN: A robust temporal feature network for time series classification

Zhiwen Xiao et al.

Summary: Time series data contains both local and global patterns, but existing feature networks focus on local features and neglect the relationships among them. Therefore, a novel RTFN method is proposed for feature extraction in time series, consisting of TFN and LSTMaN. Experimental results show that the RTFN-based structures achieve excellent performance on multiple datasets.

INFORMATION SCIENCES (2021)

Article Computer Science, Artificial Intelligence

Integration of textual cues for fine-grained image captioning using deep CNN and LSTM

Neeraj Gupta et al.

NEURAL COMPUTING & APPLICATIONS (2020)

Article Computer Science, Artificial Intelligence

Learning visual relationship and context-aware attention for image captioning

Junbo Wang et al.

PATTERN RECOGNITION (2020)

Article Computer Science, Information Systems

A Novel IoT-Perceptive Human Activity Recognition (HAR) Approach Using Multihead Convolutional Attention

Haoxi Zhang et al.

IEEE INTERNET OF THINGS JOURNAL (2020)

Article Computer Science, Information Systems

Constrained LSTM and Residual Attention for Image Captioning

Liang Yang et al.

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS (2020)

Article Physics, Applied

Incorporating external knowledge for image captioning using CNN and LSTM

Himanshu Sharma et al.

MODERN PHYSICS LETTERS B (2020)

Article Computer Science, Artificial Intelligence

Deep Collaborative Multi-View Hashing for Large-Scale Image Search

Lei Zhu et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2020)

Article Computer Science, Artificial Intelligence

Image Caption Generation with Part of Speech Guidance

Xinwei He et al.

PATTERN RECOGNITION LETTERS (2019)

Article Computer Science, Artificial Intelligence

A hierarchical and regional deep learning architecture for image description generation

Philip Kinghorn et al.

PATTERN RECOGNITION LETTERS (2019)

Article Computer Science, Information Systems

Image Captioning by Asking Questions

Xiaoshan Yang et al.

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS (2019)

Article Computer Science, Information Systems

Know More Say Less: Image Captioning Based on Scene Graphs

Xiangyang Li et al.

IEEE TRANSACTIONS ON MULTIMEDIA (2019)

Article Computer Science, Theory & Methods

A Comprehensive Survey of Deep Learning for Image Captioning

Md Zakir Hossain et al.

ACM COMPUTING SURVEYS (2019)

Article Computer Science, Artificial Intelligence

Attentive Linear Transformation for Image Captioning

Senmao Ye et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2018)

Proceedings Paper Computer Science, Artificial Intelligence

Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present

Xinpeng Chen et al.

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2018)

Article Computer Science, Artificial Intelligence

A Strength Pareto Evolutionary Algorithm Based on Reference Direction for Multiobjective and Many-Objective Optimization

Shouyong Jiang et al.

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION (2017)

Article Computer Science, Artificial Intelligence

Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge

Oriol Vinyals et al.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2017)

Article Computer Science, Artificial Intelligence

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Ranjay Krishna et al.

INTERNATIONAL JOURNAL OF COMPUTER VISION (2017)

Article Engineering, Electrical & Electronic

Cascade recurrent neural network for image caption generation

Jie Wu et al.

ELECTRONICS LETTERS (2017)

Proceedings Paper Computer Science, Artificial Intelligence

Visual Dialog

Abhishek Das et al.

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) (2017)

Proceedings Paper Computer Science, Artificial Intelligence

SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning

Long Chen et al.

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) (2017)

Article Computer Science, Artificial Intelligence

Decision trees: a recent overview

S. B. Kotsiantis

ARTIFICIAL INTELLIGENCE REVIEW (2013)

Article Computer Science, Artificial Intelligence

The Graph Neural Network Model

Franco Scarselli et al.

IEEE TRANSACTIONS ON NEURAL NETWORKS (2009)