4.7 Article

I(2)Transformer: Intra- and Inter-Relation Embedding Transformer for TV Show Captioning

Related references

Note: Only part of the references are listed.
Article Engineering, Electrical & Electronic

Task-Adaptive Attention for Image Captioning

Chenggang Yan et al.

Summary: This paper proposes a Task-Adaptive Attention module for image captioning, which learns non-visual clues to address the misleading issue in attention models during word generation. The module is further enhanced with diversity regularization to improve expression ability. Experimental results on MSCOCO captioning dataset show that the module improves the performance of a vanilla Transformer-based image captioning model.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2022)

Article Computer Science, Artificial Intelligence

Deep features for person re-identification on metric learning

Wanyin Wu et al.

Summary: This study summarizes different types of features and metric learning approaches for person re-identification from a label attributes perspective. By combining advanced methods in data enhancement and feature extraction, comprehensive experiments were conducted on metric learning methods using two datasets, revealing the relationships between loss functions, deep feature space, and metric learning.

PATTERN RECOGNITION (2021)

Article Computer Science, Artificial Intelligence

Enhancing the alignment between target words and corresponding frames for video captioning

Yunbin Tu et al.

Summary: Video captioning aims to translate video frames into words using an encoder-decoder framework. By introducing pre-detected visual tags and a Textual-Temporal Attention Model, the alignment between target words and video frames can be improved, enhancing translation accuracy.

PATTERN RECOGNITION (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Improving Video Captioning with Temporal Composition of a Visual-Syntactic Embedding

Jesus Perez-Martin et al.

Summary: This paper addresses the issue of syntactically incorrect sentences in video captioning by integrating syntactic representation learning into the process. The proposed Visual-Semantic-Syntactic Aligned Network architecture achieves state-of-the-art results on two widely used video captioning datasets.

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021 (2021)

Article Computer Science, Artificial Intelligence

Vocabulary-Wide Credit Assignment for Training Image Captioning Models

Han Liu et al.

Summary: The study proposes a new credit assignment method in reinforcement learning algorithms, called vocabulary-wide credit assignment, which assigns appropriate credits to each word in the vocabulary at each generation step. This method has been applied to training image captioning models, leading to better experimental results.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2021)

Article Computer Science, Artificial Intelligence

Cross-Domain Image Captioning via Cross-Modal Retrieval and Model Adaptation

Wentian Zhao et al.

Summary: In this paper, a cross-domain image captioning method is proposed, which leverages a cross-modal retrieval model to generate pseudo image-sentence pairs in the target domain to facilitate model adaptation. Experimental results demonstrate that the method achieves better performance on different datasets.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2021)

Article Computer Science, Information Systems

Integrating Part of Speech Guidance for Image Captioning

Ji Zhang et al.

Summary: The paper proposes an integrated image captioning method that incorporates part of speech information, using a part of speech prediction network within an encoder-decoder framework, and multi-task learning to generate captions with more accurate visual information and better compliance with language habits and grammar rules.

IEEE TRANSACTIONS ON MULTIMEDIA (2021)

Article Computer Science, Information Systems

STAT: Spatial-Temporal Attention Mechanism for Video Captioning

Chenggang Yan et al.

IEEE TRANSACTIONS ON MULTIMEDIA (2020)

Article Computer Science, Artificial Intelligence

Learning visual relationship and context-aware attention for image captioning

Junbo Wang et al.

PATTERN RECOGNITION (2020)

Article Computer Science, Artificial Intelligence

An Ensemble of Generation- and Retrieval-Based Image Captioning With Dual Generator Generative Adversarial Network

Min Yang et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2020)

Article Computer Science, Artificial Intelligence

Video Captioning With Object-Aware Spatio-Temporal Correlation and Aggregation

Junchao Zhang et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2020)

Article Computer Science, Artificial Intelligence

Domain-Weighted Majority Voting for Crowdsourcing

Dapeng Tao et al.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2019)

Article Computer Science, Artificial Intelligence

CAM-RNN: Co-Attention Model Based RNN for Video Captioning

Bin Zhao et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2019)

Proceedings Paper Computer Science, Artificial Intelligence

Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning

Tanzila Rahman et al.

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) (2019)

Proceedings Paper Computer Science, Artificial Intelligence

Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network

Bairui Wang et al.

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) (2019)

Article Computer Science, Artificial Intelligence

Video Captioning by Adversarial LSTM

Yang Yang et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2018)

Proceedings Paper Computer Science, Artificial Intelligence

Task-Driven Dynamic Fusion: Reducing Ambiguity in Video Description

Xishan Zhang et al.

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) (2017)

Proceedings Paper Computer Science, Artificial Intelligence

Attention-Based Multimodal Fusion for Video Description

Chiori Hori et al.

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) (2017)

Proceedings Paper Computer Science, Artificial Intelligence

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

Joao Carreira et al.

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) (2017)

Article Computer Science, Artificial Intelligence

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky et al.

INTERNATIONAL JOURNAL OF COMPUTER VISION (2015)