☆ 4.8 Article

Visual Scanpath Prediction Using IOR-ROI Recurrent Mixture Density Network

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2021)

Journal

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Volume 43, Issue 6, Pages 2101-2118

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/TPAMI.2019.2956930

Keywords

Visualization; Predictive models; Computational modeling; Feature extraction; Hidden Markov models; Solid modeling; Semantics; Visual scanpath prediction; fixation duration prediction; inhibition of return; LSTM; mixture density network

Funding

National Key R&D Program of China [2017YFB1002202]
National Natural Science Foundation of China [61771348]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

A deep learning model is proposed in this paper to predict human-like visual scanpaths under task-free viewing conditions. The model combines features extracted by convolutional neural networks and simulates eye movements in different regions using Long Short-Term Memory networks.

A visual scanpath represents the human eye movements when scanning the visual field for acquiring and receiving visual information. Predicting visual scanpaths when a certain stimulus is presented plays an important role in modeling overt human visual attention and search behavior. In this paper, we presented an 'Inhibition of Return - Region of Interest' (IOR-ROI) recurrent mixture density network based framework learning to produce human-like visual scanpaths under task-free viewing conditions. The proposed model simultaneously predicts a sequence of ordered fixation positions and their corresponding fixation durations. Our model integrates bottom-up features and semantic features extracted by convolutional neural networks. Then the integrated feature maps are fed into the IOR-ROI Long Short-Term Memory (LSTM) which is the core component of the proposed model. The IOR-ROI LSTM is a dual LSTM unit, i.e., the IOR-LSTM and the ROI-LSTM, capturing IOR dynamics and gaze shift behavior simultaneously. IOR-LSTM simulates the visual working memory to adaptively maintain and update visual information regarding previously fixated regions. ROI-LSTM is responsible for predicting the next possible ROIs given the spatially inhibited image feature maps on the feature-wise basis. Fixation duration is predicted by a regression neural network given the viewing history and image feature maps corresponding to currently fixated ROI. Considering the eye movement pattern variations among subjects, a mixture density network is adopted to model the next fixation distribution as Gaussian mixtures and the fixation duration is also modeled using Gaussian distribution. Our model is evaluated on the OSIE and MIT low resolution eye-tracking datasets and experimental results indicate that the proposed method can achieve superior performance in predicting visual scanpaths. The code will be publicly available on URL: https://github.com/sunwj/scanpath.

Visual Scanpath Prediction Using IOR-ROI Recurrent Mixture Density Network

Journal

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Visual Scanpath Prediction Using IOR-ROI Recurrent Mixture Density Network

Journal

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper