☆ 4.8 Article

An attention-based hybrid deep learning approach for bengali video captioning

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES (2023)

期刊

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES

卷 35, 期 1, 页码 257-269

出版社

ELSEVIER

DOI: 10.1016/j.jksuci.2022.11.015

关键词

Bengali video captioning; Convolutional neural network; Encoder-decoder model; Recurrent neural network; Attention-mechanism

类别

Computer Science, Information Systems

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Video captioning is an automated process that generates captions for videos by understanding their content. This research focuses on Bengali video captioning, which is an underexplored area compared to English video captioning. The study implements sequence-to-sequence models like LSTM, BiLSTM, and GRU combined with CNN models VGG-19, Inceptionv3, and ResNet50v2 to extract video frame features and generate textual descriptions. Attention mechanism is also incorporated for the first time in Bengali video captioning. A novel Bengali video captioning dataset is created from Microsoft Research Video Description Corpus (MSVD) dataset, and the model's performance is evaluated using popular metrics such as BLEU, METEOR, and ROUGE. The proposed attention-based hybrid model outperforms existing models and sets a new benchmark for Bengali video captioning.

Video captioning is an automated process of captioning a video by understanding the content within it. Although numerous studies have been performed on video captioning in English, the field of video cap-tioning in Bengali remains nearly unexplored. Therefore, this research aims at generating Bengali cap-tions that plausibly describe the gist of a specific video as well as identifying the best performing model for Bengali video captioning. To accomplish this, several sequence-to-sequence models - LSTM, BiLSTM, and GRU are implemented that takes the video frame features as input, extracted through differ-ent CNN models - VGG-19, Inceptionv3, and ResNet50v2, and provides a corresponding textual descrip-tion as output. Moreover, the Attention mechanism is incorporated with these models as a first-ever attempt in Bengali video captioning. In this study, a novel Bengali video captioning dataset is constructed from Microsoft Research Video Description Corpus (MSVD) dataset (an English video captioning dataset) through utilizing a deep learning-based translator and manual post-editing efforts. Finally, the model's performance is evaluated in terms of popular performance evaluation metrics -BLEU, METEOR, and ROUGE. The proposed attention-based hybrid model outperforms the existing models in terms of these evaluation metrics, establishing a new benchmark for Bengali video captioning. (c) 2022 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

An attention-based hybrid deep learning approach for bengali video captioning

期刊

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

An attention-based hybrid deep learning approach for bengali video captioning

期刊

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文