☆ 4.7 Article

Video Storytelling: Textual Summaries for Events

IEEE TRANSACTIONS ON MULTIMEDIA (2020)

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

卷 22, 期 2, 页码 554-565

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TMM.2019.2930041

关键词

Visualization; Task analysis; Semantics; Streaming media; Recurrent neural networks; Measurement; Natural languages; Video storytelling; video captioning; sentence retrieval; multimodal embedding learning

类别

Computer Science, Information Systems Computer Science, Software Engineering Telecommunications

资金

National Research Foundation, Prime Minister's Office, Singapore under its Strategic Capability Research Centres Funding Initiative

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Bridging vision and natural language is a longstanding goal in computer vision and multimedia research. While earlier works focus on generating a single-sentence description for visual content, recent works have studied paragraph generation. In this paper, we introduce the problem of video storytelling, which aims at generating coherent and succinct stories for long videos. Video storytelling introduces new challenges, mainly due to the diversity of the story and the length and complexity of the video. We propose novel methods to address the challenges. First, we propose a context-aware framework for multimodal embedding learning, where we design a residual bidirectional recurrent neural network to leverage contextual information from past and future. The multimodal embedding is then used to retrieve sentences for video clips. Second, we propose a Narrator model to select clips that are representative of the underlying storyline. The Narrator is formulated as a reinforcement learning agent, which is trained by directly optimizing the textual metric of the generated story. We evaluate our method on the video story dataset, a new dataset that we have collected to enable the study. We compare our method with multiple state-of-the-art baselines and show that our method achieves better performance, in terms of quantitative measures and user study.

Video Storytelling: Textual Summaries for Events

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Video Storytelling: Textual Summaries for Events

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文