☆ 4.7 Article

Parallel Dense Video Caption Generation with Multi-Modal Features

MATHEMATICS (2023)

Journal

MATHEMATICS

Volume 11, Issue 17, Pages -

Publisher

MDPI

DOI: 10.3390/math11173685

Keywords

dense video caption; video captioning; multimodal feature fusion; feature extraction; neural network

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This work proposes a parallel-based dense video captioning method that can address the mutual constraint between event proposals and captions. It introduces a deformable Transformer framework to reduce or eliminate manual threshold of hyperparameters. Experimental results show that the proposed method outperforms other methods in this area, providing competitive results on the ActivityNet Caption dataset.

The task of dense video captioning is to generate detailed natural-language descriptions for an original video, which requires deep analysis and mining of semantic captions to identify events in the video. Existing methods typically follow a localisation-then-captioning sequence within given frame sequences, resulting in caption generation that is highly dependent on which objects have been detected. This work proposes a parallel-based dense video captioning method that can simultaneously address the mutual constraint between event proposals and captions. Additionally, a deformable Transformer framework is introduced to reduce or free manual threshold of hyperparameters in such methods. An information transfer station is also added as a representation organisation, which receives the hidden features extracted from a frame and implicitly generates multiple event proposals. The proposed method also adopts LSTM (Long short-term memory) with deformable attention as the main layer for caption generation. Experimental results show that the proposed method outperforms other methods in this area to a certain degree on the ActivityNet Caption dataset, providing competitive results.

Parallel Dense Video Caption Generation with Multi-Modal Features

Journal

MATHEMATICS

Publisher

MDPI

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Parallel Dense Video Caption Generation with Multi-Modal Features

Journal

MATHEMATICS

Publisher

MDPI

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper