☆ 4.7 Article

What/Where to Look Next? Modeling Top-Down Visual Attention in Complex Interactive Environments

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS (2014)

期刊

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS

卷 44, 期 5, 页码 523-538

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TSMC.2013.2279715

关键词

Bottom-up saliency; complex natural scenes; eye movement prediction; gaze prediction; interactive environments; task-driven attention; top-down attention; visual attention

类别

Automation & Control Systems Computer Science, Cybernetics

资金

Defense Advanced Research Projects Agency [HR0011-10-C-0034]
National Science Foundation (CRCNS) [BCS-0827764]
General Motors Corporation
Army Research Office [W911NF-08-1-0360, W911NF-11-1-0046]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Several visual attention models have been proposed for describing eye movements over simple stimuli and tasks such as free viewing or visual search. Yet, to date, there exists no computational framework that can reliably mimic human gaze behavior in more complex environments and tasks such as urban driving. In addition, benchmark datasets, scoring techniques, and top-down model architectures are not yet well understood. In this paper, we describe new task-dependent approaches for modeling top-down overt visual attention based on graphical models for probabilistic inference and reasoning. We describe a dynamic Bayesian network that infers probability distributions over attended objects and spatial locations directly from observed data. Probabilistic inference in our model is performed over object-related functions that are fed from manual annotations of objects in video scenes or by state-of-the-art object detection/recognition algorithms. Evaluating over approximately 3 h (approximately 315 000 eye fixations and 12 600 saccades) of observers playing three video games (time-scheduling, driving, and flight combat), we show that our approach is significantly more predictive of eye fixations compared to: 1) simpler classifier-based models also developed here that map a signature of a scene (multimodal information from gist, bottom-up saliency, physical actions, and events) to eye positions; 2) 14 state-of-the-art bottom-up saliency models; and 3) brute-force algorithms such as mean eye position. Our results show that the proposed model is more effective in employing and reasoning over spatio-temporal visual data compared with the state-of-the-art.

What/Where to Look Next? Modeling Top-Down Visual Attention in Complex Interactive Environments

期刊

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

What/Where to Look Next? Modeling Top-Down Visual Attention in Complex Interactive Environments

期刊

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文