4.7 Article

DeepVS2.0: A Saliency-Structured Deep Learning Method for Predicting Dynamic Visual Attention

期刊

出版社

SPRINGER
DOI: 10.1007/s11263-020-01371-6

关键词

Deep neural networks; Saliency prediction; Convolutional LSTM; Eye-tracking database; Video; Video database

资金

  1. NSFC [61922009, 61876013, 61573037]

向作者/读者索取更多资源

In this paper, a novel DNN-based video saliency prediction method DeepVS2.0 is proposed, utilizing a large-scale eye-tracking database LEDOV to train the DNN models. The models OM-CNN and SS-ConvLSTM are introduced to predict video saliency, considering the temporal correlation of human attention to objects and object motion. The experimental results show an improvement in video saliency prediction accuracy using DeepVS2.0.
Deep neural networks (DNNs) have exhibited great success in image saliency prediction. However, few works apply DNNs to predict the saliency of generic videos. In this paper, we propose a novel DNN-based video saliency prediction method, called DeepVS2.0. Specifically, we establish a large-scale eye-tracking database of videos (LEDOV), which provides sufficient data to train the DNN models for predicting video saliency. Through the statistical analysis of LEDOV, we find that human attention is normally attracted by objects, particularly moving objects or the moving parts of objects. Accordingly, we propose an object-to-motion convolutional neural network (OM-CNN) in DeepVS2.0 to learn spatio-temporal features for predicting the intra-frame saliency via exploring the information of both objectness and object motion. We further find from our database that human attention has a temporal correlation with a smooth saliency transition across video frames. Therefore, a saliency-structured convolutional long short-term memory network (SS-ConvLSTM) is developed in DeepVS2.0 to predict inter-frame saliency, using the extracted features of OM-CNN as the input. Moreover, the center-bias dropout and sparsity-weighted loss are embedded in SS-ConvLSTM, to consider the center-bias and sparsity of human attention maps. Finally, the experimental results show that our DeepVS2.0 method advances the state-of-the-art video saliency prediction.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据