What/Where to Look Next? Modeling Top-down Visual Attention in Complex Interactive Environments. Borji, A., Sihite, D. N., & Itti, L. IEEE Transactions on Systems, Man, and Cybernetics, Part A - Systems and Humans, 2012. abstract bibtex Several visual attention models have been proposed for describing eye movements over simple stimuli and tasks such as free viewing or visual search. Yet to date, there exists no computational framework that can reliably mimic human gaze behavior in more complex environments and tasks such as urban driving. Additionally, benchmark datasets, scoring techniques, and top-down model architectures are not yet well understood. In this study, we describe new task-dependent approaches for modeling top-down overt visual attention based on graphical models for probabilistic inference and reasoning. We describe a Dynamic Bayesian Network (DBN) that infers probability distributions over attended objects and spatial locations directly from observed data. Probabilistic inference in our model is performed over object-related functions which are fed from manual annotations of objects in video scenes or by state-of- the-art object detection/recognition algorithms. Evaluating over appx. 3 hours (appx. 315,000 eye fixations and 12,600 saccades) of observers playing 3 video games (time-scheduling, driving, and flight combat), we show that our approach is significantly more predictive of eye fixations compared to: (1) simpler classifier- based models also developed here that map a signature of a scene (multi-modal information from gist, bottom-up saliency, physical actions, and events) to eye positions, (2) 14 state-of-the-art bottom-up saliency models, and (3) brute-force algorithms such as mean eye position. Our results show that the proposed model is more effective in employing and reasoning over spatio- temporal visual data compared with the state-of-the-art.
@article{ Borji_etal12smc,
author = {A. Borji and D. N. Sihite and L. Itti},
title = {What/Where to Look Next? Modeling Top-down Visual Attention in Complex Interactive Environments},
journal = {IEEE Transactions on Systems, Man, and Cybernetics, Part A - Systems and Humans},
abstract = {Several visual attention models have been proposed for describing eye movements over simple stimuli and tasks
such as free viewing or visual search. Yet to date, there exists no computational framework that can
reliably mimic human gaze behavior in more complex environments and tasks such as urban
driving. Additionally, benchmark datasets, scoring techniques, and top-down model architectures are
not yet well understood. In this study, we describe new task-dependent approaches for modeling
top-down overt visual attention based on graphical models for probabilistic inference and
reasoning. We describe a Dynamic Bayesian Network (DBN) that infers probability distributions over
attended objects and spatial locations directly from observed data. Probabilistic inference in our
model is performed over object-related functions which are fed from manual annotations of objects in
video scenes or by state-of- the-art object detection/recognition algorithms. Evaluating over appx. 3
hours (appx. 315,000 eye fixations and 12,600 saccades) of observers playing 3 video games
(time-scheduling, driving, and flight combat), we show that our approach is significantly more
predictive of eye fixations compared to: (1) simpler classifier- based models also developed here that
map a signature of a scene (multi-modal information from gist, bottom-up saliency, physical actions,
and events) to eye positions, (2) 14 state-of-the-art bottom-up saliency models, and (3) brute-force
algorithms such as mean eye position. Our results show that the proposed model is more effective in
employing and reasoning over spatio- temporal visual data compared with the state-of-the-art.},
pages = {1-16 (in press)},
year = {2012},
type = {mod;td},
if = {2011 Impact Factor: 2.123},
file = {http://ilab.usc.edu/publications/doc/Borji_etal12smc.pdf}
}
Downloads: 0
{"_id":{"_str":"5298a1a09eb585cc26000814"},"__v":0,"authorIDs":[],"author_short":["Borji, A.","Sihite, D.<nbsp>N.","Itti, L."],"bibbaseid":"borji-sihite-itti-whatwheretolooknextmodelingtopdownvisualattentionincomplexinteractiveenvironments-2012","bibdata":{"html":"<div class=\"bibbase_paper\"> \n\n\n<span class=\"bibbase_paper_titleauthoryear\">\n\t<span class=\"bibbase_paper_title\"><a name=\"Borji_etal12smc\"> </a>What/Where to Look Next? Modeling Top-down Visual Attention in Complex Interactive Environments.</span>\n\t<span class=\"bibbase_paper_author\">\nBorji, A.; Sihite, D. N.; and Itti, L.</span>\n\t<!-- <span class=\"bibbase_paper_year\">2012</span>. -->\n</span>\n\n\n\n<i>IEEE Transactions on Systems, Man, and Cybernetics, Part A - Systems and Humans</i>,\n\n1-16 (in press).\n\n 2012.\n\n\n\n\n<br class=\"bibbase_paper_content\"/>\n\n<span class=\"bibbase_paper_content\">\n \n \n \n <a href=\"javascript:showBib('Borji_etal12smc')\"\n class=\"bibbase link\">\n <!-- <img src=\"http://www.bibbase.org/img/filetypes/bib.png\" -->\n\t<!-- alt=\"What/Where to Look Next? Modeling Top-down Visual Attention in Complex Interactive Environments [bib]\" -->\n\t<!-- class=\"bibbase_icon\" -->\n\t<!-- style=\"width: 24px; height: 24px; border: 0px; vertical-align: text-top\"><span class=\"bibbase_icon_text\">Bibtex</span> -->\n BibTeX\n <i class=\"fa fa-caret-down\"></i></a>\n \n \n \n <a class=\"bibbase_abstract_link bibbase link\"\n href=\"javascript:showAbstract('Borji_etal12smc')\">\n Abstract\n <i class=\"fa fa-caret-down\"></i></a>\n \n \n \n\n \n \n \n</span>\n\n<div class=\"well well-small bibbase\" id=\"bib_Borji_etal12smc\"\n style=\"display:none\">\n <pre>@article{ Borji_etal12smc,\n author = {A. Borji and D. N. Sihite and L. Itti},\n title = {What/Where to Look Next? Modeling Top-down Visual Attention in Complex Interactive Environments},\n journal = {IEEE Transactions on Systems, Man, and Cybernetics, Part A - Systems and Humans},\n abstract = {Several visual attention models have been proposed for describing eye movements over simple stimuli and tasks\n such as free viewing or visual search. Yet to date, there exists no computational framework that can\n reliably mimic human gaze behavior in more complex environments and tasks such as urban\n driving. Additionally, benchmark datasets, scoring techniques, and top-down model architectures are\n not yet well understood. In this study, we describe new task-dependent approaches for modeling\n top-down overt visual attention based on graphical models for probabilistic inference and\n reasoning. We describe a Dynamic Bayesian Network (DBN) that infers probability distributions over\n attended objects and spatial locations directly from observed data. Probabilistic inference in our\n model is performed over object-related functions which are fed from manual annotations of objects in\n video scenes or by state-of- the-art object detection/recognition algorithms. Evaluating over appx. 3\n hours (appx. 315,000 eye fixations and 12,600 saccades) of observers playing 3 video games\n (time-scheduling, driving, and flight combat), we show that our approach is significantly more\n predictive of eye fixations compared to: (1) simpler classifier- based models also developed here that\n map a signature of a scene (multi-modal information from gist, bottom-up saliency, physical actions,\n and events) to eye positions, (2) 14 state-of-the-art bottom-up saliency models, and (3) brute-force\n algorithms such as mean eye position. Our results show that the proposed model is more effective in\n employing and reasoning over spatio- temporal visual data compared with the state-of-the-art.},\n pages = {1-16 (in press)},\n year = {2012},\n type = {mod;td},\n if = {2011 Impact Factor: 2.123},\n file = {http://ilab.usc.edu/publications/doc/Borji_etal12smc.pdf}\n}</pre>\n</div>\n\n\n<div class=\"well well-small bibbase\" id=\"abstract_Borji_etal12smc\"\n style=\"display:none\">\n Several visual attention models have been proposed for describing eye movements over simple stimuli and tasks such as free viewing or visual search. Yet to date, there exists no computational framework that can reliably mimic human gaze behavior in more complex environments and tasks such as urban driving. Additionally, benchmark datasets, scoring techniques, and top-down model architectures are not yet well understood. In this study, we describe new task-dependent approaches for modeling top-down overt visual attention based on graphical models for probabilistic inference and reasoning. We describe a Dynamic Bayesian Network (DBN) that infers probability distributions over attended objects and spatial locations directly from observed data. Probabilistic inference in our model is performed over object-related functions which are fed from manual annotations of objects in video scenes or by state-of- the-art object detection/recognition algorithms. Evaluating over appx. 3 hours (appx. 315,000 eye fixations and 12,600 saccades) of observers playing 3 video games (time-scheduling, driving, and flight combat), we show that our approach is significantly more predictive of eye fixations compared to: (1) simpler classifier- based models also developed here that map a signature of a scene (multi-modal information from gist, bottom-up saliency, physical actions, and events) to eye positions, (2) 14 state-of-the-art bottom-up saliency models, and (3) brute-force algorithms such as mean eye position. Our results show that the proposed model is more effective in employing and reasoning over spatio- temporal visual data compared with the state-of-the-art.\n</div>\n\n\n</div>\n","downloads":0,"bibbaseid":"borji-sihite-itti-whatwheretolooknextmodelingtopdownvisualattentionincomplexinteractiveenvironments-2012","role":"author","year":"2012","type":"mod;td","title":"What/Where to Look Next? Modeling Top-down Visual Attention in Complex Interactive Environments","pages":"1-16 (in press)","key":"Borji_etal12smc","journal":"IEEE Transactions on Systems, Man, and Cybernetics, Part A - Systems and Humans","if":"2011 Impact Factor: 2.123","id":"Borji_etal12smc","file":"http://ilab.usc.edu/publications/doc/Borji_etal12smc.pdf","bibtype":"article","bibtex":"@article{ Borji_etal12smc,\n author = {A. Borji and D. N. Sihite and L. Itti},\n title = {What/Where to Look Next? Modeling Top-down Visual Attention in Complex Interactive Environments},\n journal = {IEEE Transactions on Systems, Man, and Cybernetics, Part A - Systems and Humans},\n abstract = {Several visual attention models have been proposed for describing eye movements over simple stimuli and tasks\n such as free viewing or visual search. Yet to date, there exists no computational framework that can\n reliably mimic human gaze behavior in more complex environments and tasks such as urban\n driving. Additionally, benchmark datasets, scoring techniques, and top-down model architectures are\n not yet well understood. In this study, we describe new task-dependent approaches for modeling\n top-down overt visual attention based on graphical models for probabilistic inference and\n reasoning. We describe a Dynamic Bayesian Network (DBN) that infers probability distributions over\n attended objects and spatial locations directly from observed data. Probabilistic inference in our\n model is performed over object-related functions which are fed from manual annotations of objects in\n video scenes or by state-of- the-art object detection/recognition algorithms. Evaluating over appx. 3\n hours (appx. 315,000 eye fixations and 12,600 saccades) of observers playing 3 video games\n (time-scheduling, driving, and flight combat), we show that our approach is significantly more\n predictive of eye fixations compared to: (1) simpler classifier- based models also developed here that\n map a signature of a scene (multi-modal information from gist, bottom-up saliency, physical actions,\n and events) to eye positions, (2) 14 state-of-the-art bottom-up saliency models, and (3) brute-force\n algorithms such as mean eye position. Our results show that the proposed model is more effective in\n employing and reasoning over spatio- temporal visual data compared with the state-of-the-art.},\n pages = {1-16 (in press)},\n year = {2012},\n type = {mod;td},\n if = {2011 Impact Factor: 2.123},\n file = {http://ilab.usc.edu/publications/doc/Borji_etal12smc.pdf}\n}","author_short":["Borji, A.","Sihite, D.<nbsp>N.","Itti, L."],"author":["Borji, A.","Sihite, D. N.","Itti, L."],"abstract":"Several visual attention models have been proposed for describing eye movements over simple stimuli and tasks such as free viewing or visual search. Yet to date, there exists no computational framework that can reliably mimic human gaze behavior in more complex environments and tasks such as urban driving. Additionally, benchmark datasets, scoring techniques, and top-down model architectures are not yet well understood. In this study, we describe new task-dependent approaches for modeling top-down overt visual attention based on graphical models for probabilistic inference and reasoning. We describe a Dynamic Bayesian Network (DBN) that infers probability distributions over attended objects and spatial locations directly from observed data. Probabilistic inference in our model is performed over object-related functions which are fed from manual annotations of objects in video scenes or by state-of- the-art object detection/recognition algorithms. Evaluating over appx. 3 hours (appx. 315,000 eye fixations and 12,600 saccades) of observers playing 3 video games (time-scheduling, driving, and flight combat), we show that our approach is significantly more predictive of eye fixations compared to: (1) simpler classifier- based models also developed here that map a signature of a scene (multi-modal information from gist, bottom-up saliency, physical actions, and events) to eye positions, (2) 14 state-of-the-art bottom-up saliency models, and (3) brute-force algorithms such as mean eye position. Our results show that the proposed model is more effective in employing and reasoning over spatio- temporal visual data compared with the state-of-the-art."},"bibtype":"article","biburl":"http://ilab.usc.edu/publications/src/ilab.bib","downloads":0,"search_terms":["look","next","modeling","top","down","visual","attention","complex","interactive","environments","borji","sihite","itti"],"title":"What/Where to Look Next? Modeling Top-down Visual Attention in Complex Interactive Environments","year":2012,"dataSources":["wedBDxEpNXNCLZ2sZ"]}