A computational model of task-dependent influences on eye position. Peters, R. J. & Itti, L. In Proc. Vision Science Society Annual Meeting (VSS06), May, 2006.
abstract   bibtex   
Computational models of bottom-up attention can perform significantly above chance at predicting eye positions of observers passively viewing static or dynamic images. Nevertheless, much of eye movement behavior (50 percent or more) is unexplained by purely bottom-up models, and is typically attributed to top-down, inter-observer, task-dependent, or random effects. Other studies have qualitatively described such high-level effects in naturalistic interactive visual tasks (e.g., while driving, how often do people fixate other cars, or the road, or road signs); yet the underlying neurocomputational mechanisms remain unknown. Here, we introduce a simple computational model of task-related eye position influences in interactive tasks with dynamic stimuli. This model extracts from each frame a low-dimensional feature signature ("gist"), compares that with a database of eye position training frames, and produces an eye position prediction map. Finally, we combine the task-related and bottom-up maps, and compare the combined maps with observers' actual eye positions across 216,000 frames from 24 five-minute videogame-playing sessions. For analysis, each map was rescaled to have zero mean and unit standard deviation; the average predicted value at human eye position locations was 0.61 +/- 0.1 in the purely bottom-up maps, and 2.42 +/- 0.07 in the combined maps (a random model gives an average value of 0). Thus, this straightforward model of task-dependent effects offers some of the strongest purely computational general-purpose eye movement predictions to date, going significantly beyond what is explained by purely bottom-up effects; yet it relies only on simple visual features, without requiring any high-level semantic scene description.
@inproceedings{ Peters_Itti06vss,
  author = {R. J. Peters and L. Itti},
  title = {A computational model of task-dependent influences on eye position},
  abstract = {Computational models of bottom-up attention can perform
significantly above chance at predicting eye positions of observers
passively viewing static or dynamic images. Nevertheless, much of eye
movement behavior (50 percent or more) is unexplained by purely
bottom-up models, and is typically attributed to top-down,
inter-observer, task-dependent, or random effects. Other studies have
qualitatively described such high-level effects in naturalistic
interactive visual tasks (e.g., while driving, how often do people
fixate other cars, or the road, or road signs); yet the underlying
neurocomputational mechanisms remain unknown. Here, we introduce a
simple computational model of task-related eye position influences in
interactive tasks with dynamic stimuli. This model extracts from each
frame a low-dimensional feature signature ("gist"), compares that
with a database of eye position training frames, and produces an eye
position prediction map. Finally, we combine the task-related and
bottom-up maps, and compare the combined maps with observers' actual
eye positions across 216,000 frames from 24 five-minute
videogame-playing sessions. For analysis, each map was rescaled to
have zero mean and unit standard deviation; the average predicted
value at human eye position locations was 0.61 +/- 0.1 in the purely
bottom-up maps, and 2.42 +/- 0.07 in the combined maps (a random model
gives an average value of 0). Thus, this straightforward model of
task-dependent effects offers some of the strongest purely
computational general-purpose eye movement predictions to date, going
significantly beyond what is explained by purely bottom-up effects;
yet it relies only on simple visual features, without requiring any
high-level semantic scene description.},
  booktitle = {Proc. Vision Science Society Annual Meeting (VSS06)},
  year = {2006},
  month = {May},
  type = {mod;bu;td;eye},
  review = {abs/conf}
}

Downloads: 0