Modeling the Influence of Action on Spatial Attention in Visual Interactive Environments. Borji, A., Sihite, D. N., & Itti, L. In Proc. IEEE International Conference on Robotics and Automation (ICRA), pages 1-7, May, 2012.
abstract   bibtex   
A large number of studies have been reported on top-down influences of visual attention. However, less progress have been made in understanding and modeling its mechanisms in real-world tasks. In this paper, we propose an approach for learning spatial attention taking into account influences of physical actions on top-down attention. For this purpose, we focus on interactive visual environments (video games) which are modest real-world simulations, where a player has to attend to certain aspects of visual stimuli and perform actions to achieve a goal. The basic idea is to learn a mapping from current mental state of the game player, represented by past actions and observations, to its gaze fixation. A data-driven approach is followed where we train a model from the data of some players and test it over a new subject. In particular, two contributions this paper makes are: 1) employing multimodal information including mean eye position, gist of a scene, physical actions, bottom-up saliency, and tagged events for state representation and 2) analysis of different methods of combining bottom-up and top-down influences. Comparing with other top-down task-driven and bottom-up spatio-temporal models, our approach shows higher NSS scores in predicting eye positions.
@inproceedings{ Borji_etal12icra,
  author = {A. Borji and D. N. Sihite and L. Itti},
  title = {Modeling the Influence of Action on Spatial Attention in Visual Interactive Environments},
  booktitle = {Proc. IEEE International Conference on Robotics and Automation (ICRA)},
  abstract = {A large number of studies have been reported on top-down
                  influences of visual attention. However, less
                  progress have been made in understanding and
                  modeling its mechanisms in real-world tasks. In this
                  paper, we propose an approach for learning spatial
                  attention taking into account influences of physical
                  actions on top-down attention. For this purpose, we
                  focus on interactive visual environments (video
                  games) which are modest real-world simulations,
                  where a player has to attend to certain aspects of
                  visual stimuli and perform actions to achieve a
                  goal. The basic idea is to learn a mapping from
                  current mental state of the game player, represented
                  by past actions and observations, to its gaze
                  fixation. A data-driven approach is followed where
                  we train a model from the data of some players and
                  test it over a new subject. In particular, two
                  contributions this paper makes are: 1) employing
                  multimodal information including mean eye position,
                  gist of a scene, physical actions, bottom-up
                  saliency, and tagged events for state representation
                  and 2) analysis of different methods of combining
                  bottom-up and top-down influences. Comparing with
                  other top-down task-driven and bottom-up
                  spatio-temporal models, our approach shows higher
                  NSS scores in predicting eye positions.},
  year = {2012},
  month = {May},
  pages = {1-7},
  review = {full/conf},
  type = {mod;td;cv},
  if = {2012 acceptance rate: 40%},
  file = {http://ilab.usc.edu/publications/doc/Borji_etal12icra.pdf}
}

Downloads: 0