☆ 4.7 Article

Recognizing and Presenting the Storytelling Video Structure With Deep Multimodal Networks

IEEE TRANSACTIONS ON MULTIMEDIA (2017)

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

卷 19, 期 5, 页码 955-968

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TMM.2016.2644872

关键词

Deep networks; performance evaluation; scene detection; temporal video segmentation

类别

Computer Science, Information Systems Computer Science, Software Engineering Telecommunications

资金

project Citta educante of the National Technological Cluster on Smart Communities - Italian Ministry of Education, University and Research (MIUR) [CTN01_00034_393801]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In this paper, we propose a novel scene detection algorithm which employs semantic, visual, textual, and audio cues. We also show how the hierarchical decomposition of the storytelling video structure can improve retrieval results presentation with semantically and aesthetically effective thumbnails. Our method is built upon two advancements of the state of the art: first is semantic feature extraction which builds video-specific concept detectors; and second is multimodal feature embedding learning that maps the feature vector of a shot to a space in which the Euclidean distance has task specific semantic properties. The proposed method is able to decompose the video in annotated temporal segments which allow us for a query specific thumbnail extraction. Extensive experiments are performed on different data sets to demonstrate the effectiveness of our algorithm. An in-depth discussion on how to deal with the subjectivity of the task is conducted and a strategy to overcome the problem is suggested.

Recognizing and Presenting the Storytelling Video Structure With Deep Multimodal Networks

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Recognizing and Presenting the Storytelling Video Structure With Deep Multimodal Networks

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文