4.7 Article

Parallel Dense Video Caption Generation with Multi-Modal Features

Journal

MATHEMATICS
Volume 11, Issue 17, Pages -

Publisher

MDPI
DOI: 10.3390/math11173685

Keywords

dense video caption; video captioning; multimodal feature fusion; feature extraction; neural network

Categories

Ask authors/readers for more resources

This work proposes a parallel-based dense video captioning method that can address the mutual constraint between event proposals and captions. It introduces a deformable Transformer framework to reduce or eliminate manual threshold of hyperparameters. Experimental results show that the proposed method outperforms other methods in this area, providing competitive results on the ActivityNet Caption dataset.
The task of dense video captioning is to generate detailed natural-language descriptions for an original video, which requires deep analysis and mining of semantic captions to identify events in the video. Existing methods typically follow a localisation-then-captioning sequence within given frame sequences, resulting in caption generation that is highly dependent on which objects have been detected. This work proposes a parallel-based dense video captioning method that can simultaneously address the mutual constraint between event proposals and captions. Additionally, a deformable Transformer framework is introduced to reduce or free manual threshold of hyperparameters in such methods. An information transfer station is also added as a representation organisation, which receives the hidden features extracted from a frame and implicitly generates multiple event proposals. The proposed method also adopts LSTM (Long short-term memory) with deformable attention as the main layer for caption generation. Experimental results show that the proposed method outperforms other methods in this area to a certain degree on the ActivityNet Caption dataset, providing competitive results.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available