期刊
IEEE TRANSACTIONS ON MULTIMEDIA
卷 19, 期 5, 页码 955-968出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2016.2644872
关键词
Deep networks; performance evaluation; scene detection; temporal video segmentation
资金
- project Citta educante of the National Technological Cluster on Smart Communities - Italian Ministry of Education, University and Research (MIUR) [CTN01_00034_393801]
In this paper, we propose a novel scene detection algorithm which employs semantic, visual, textual, and audio cues. We also show how the hierarchical decomposition of the storytelling video structure can improve retrieval results presentation with semantically and aesthetically effective thumbnails. Our method is built upon two advancements of the state of the art: first is semantic feature extraction which builds video-specific concept detectors; and second is multimodal feature embedding learning that maps the feature vector of a shot to a space in which the Euclidean distance has task specific semantic properties. The proposed method is able to decompose the video in annotated temporal segments which allow us for a query specific thumbnail extraction. Extensive experiments are performed on different data sets to demonstrate the effectiveness of our algorithm. An in-depth discussion on how to deal with the subjectivity of the task is conducted and a strategy to overcome the problem is suggested.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据