4.8 Article

Visual Scanpath Prediction Using IOR-ROI Recurrent Mixture Density Network

Journal

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TPAMI.2019.2956930

Keywords

Visualization; Predictive models; Computational modeling; Feature extraction; Hidden Markov models; Solid modeling; Semantics; Visual scanpath prediction; fixation duration prediction; inhibition of return; LSTM; mixture density network

Funding

  1. National Key R&D Program of China [2017YFB1002202]
  2. National Natural Science Foundation of China [61771348]

Ask authors/readers for more resources

A deep learning model is proposed in this paper to predict human-like visual scanpaths under task-free viewing conditions. The model combines features extracted by convolutional neural networks and simulates eye movements in different regions using Long Short-Term Memory networks.
A visual scanpath represents the human eye movements when scanning the visual field for acquiring and receiving visual information. Predicting visual scanpaths when a certain stimulus is presented plays an important role in modeling overt human visual attention and search behavior. In this paper, we presented an 'Inhibition of Return - Region of Interest' (IOR-ROI) recurrent mixture density network based framework learning to produce human-like visual scanpaths under task-free viewing conditions. The proposed model simultaneously predicts a sequence of ordered fixation positions and their corresponding fixation durations. Our model integrates bottom-up features and semantic features extracted by convolutional neural networks. Then the integrated feature maps are fed into the IOR-ROI Long Short-Term Memory (LSTM) which is the core component of the proposed model. The IOR-ROI LSTM is a dual LSTM unit, i.e., the IOR-LSTM and the ROI-LSTM, capturing IOR dynamics and gaze shift behavior simultaneously. IOR-LSTM simulates the visual working memory to adaptively maintain and update visual information regarding previously fixated regions. ROI-LSTM is responsible for predicting the next possible ROIs given the spatially inhibited image feature maps on the feature-wise basis. Fixation duration is predicted by a regression neural network given the viewing history and image feature maps corresponding to currently fixated ROI. Considering the eye movement pattern variations among subjects, a mixture density network is adopted to model the next fixation distribution as Gaussian mixtures and the fixation duration is also modeled using Gaussian distribution. Our model is evaluated on the OSIE and MIT low resolution eye-tracking datasets and experimental results indicate that the proposed method can achieve superior performance in predicting visual scanpaths. The code will be publicly available on URL: https://github.com/sunwj/scanpath.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available