Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention. Peters, R. J. & Itti, L. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, Jun, 2007. abstract bibtex A critical function in both machine vision and biological vision systems is attentional selection of scene regions worthy of further analysis by higher-level processes such as object recognition. Here we present the first model of spatial attention that (1) can be applied to arbitrary static and dynamic image sequences with interactive tasks and (2) combines a general computational implementation of both bottom-up (BU) saliency and dynamic top-down (TD) task relevance; the claimed novelty lies in the combination of these elements and in the fully computational nature of the model. The BU component computes a saliency map from 12 low-level multi-scale visual features. The TD component computes a low-level signature of the entire image, and learns to associate different classes of signatures with the different gaze patterns recorded from human subjects performing a task of interest. We measured the ability of this model to predict the eye movements of people playing contemporary video games. We found that the TD model alone predicts where humans look about twice as well as does the BU model alone; in addition, a combined BU*TD model performs significantly better than either individual component. Qualitatively, the combined model predicts some easy-to-describe but hard-to-compute aspects of attentional selection, such as shifting attention leftward when approaching a left turn along a racing track. Thus, our study demonstrates the advantages of integrating BU factors derived from a saliency map and TD factors learned from image and task contexts in predicting where humans look while performing complex visually-guided behavior.
@inproceedings{ Peters_Itti07cvpr,
author = {R. J. Peters and L. Itti},
title = {Beyond bottom-up: Incorporating task-dependent influences into a
computational model of spatial attention},
abstract = {A critical function in both machine vision and biological
vision systems is attentional selection of scene
regions worthy of further analysis by higher-level
processes such as object recognition. Here we
present the first model of spatial attention that
(1) can be applied to arbitrary static and dynamic
image sequences with interactive tasks and (2)
combines a general computational implementation of
both bottom-up (BU) saliency and dynamic top-down
(TD) task relevance; the claimed novelty lies in the
combination of these elements and in the fully
computational nature of the model. The BU component
computes a saliency map from 12 low-level
multi-scale visual features. The TD component
computes a low-level signature of the entire image,
and learns to associate different classes of
signatures with the different gaze patterns recorded
from human subjects performing a task of
interest. We measured the ability of this model to
predict the eye movements of people playing
contemporary video games. We found that the TD model
alone predicts where humans look about twice as well
as does the BU model alone; in addition, a combined
BU*TD model performs significantly better than
either individual component. Qualitatively, the
combined model predicts some easy-to-describe but
hard-to-compute aspects of attentional selection,
such as shifting attention leftward when approaching
a left turn along a racing track. Thus, our study
demonstrates the advantages of integrating BU
factors derived from a saliency map and TD factors
learned from image and task contexts in predicting
where humans look while performing complex
visually-guided behavior.},
booktitle = {Proc. IEEE Conference on Computer Vision and Pattern
Recognition (CVPR)},
address = {Minneapolis, MN},
month = {Jun},
year = {2007},
type = {bu ; cv ; td ; eye ; mod},
file = {http://ilab.usc.edu/publications/doc/Peters_Itti07cvpr.pdf},
if = {2007 acceptance rate: 28%},
review = {full/conf}
}
Downloads: 0
{"_id":{"_str":"5298a1a09eb585cc26000834"},"__v":0,"authorIDs":[],"author_short":["Peters, R.<nbsp>J.","Itti, L."],"bibbaseid":"peters-itti-beyondbottomupincorporatingtaskdependentinfluencesintoacomputationalmodelofspatialattention-2007","bibdata":{"html":"<div class=\"bibbase_paper\"> \n\n\n<span class=\"bibbase_paper_titleauthoryear\">\n\t<span class=\"bibbase_paper_title\"><a name=\"Peters_Itti07cvpr\"> </a>Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention.</span>\n\t<span class=\"bibbase_paper_author\">\nPeters, R. J.; and Itti, L.</span>\n\t<!-- <span class=\"bibbase_paper_year\">2007</span>. -->\n</span>\n\n\n\nIn\n<i>Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</i>, Minneapolis, MN, Jun 2007.\n\n\n\n\n\n<br class=\"bibbase_paper_content\"/>\n\n<span class=\"bibbase_paper_content\">\n \n \n \n <a href=\"javascript:showBib('Peters_Itti07cvpr')\"\n class=\"bibbase link\">\n <!-- <img src=\"http://www.bibbase.org/img/filetypes/bib.png\" -->\n\t<!-- alt=\"Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention [bib]\" -->\n\t<!-- class=\"bibbase_icon\" -->\n\t<!-- style=\"width: 24px; height: 24px; border: 0px; vertical-align: text-top\"><span class=\"bibbase_icon_text\">Bibtex</span> -->\n BibTeX\n <i class=\"fa fa-caret-down\"></i></a>\n \n \n \n <a class=\"bibbase_abstract_link bibbase link\"\n href=\"javascript:showAbstract('Peters_Itti07cvpr')\">\n Abstract\n <i class=\"fa fa-caret-down\"></i></a>\n \n \n \n\n \n \n \n</span>\n\n<div class=\"well well-small bibbase\" id=\"bib_Peters_Itti07cvpr\"\n style=\"display:none\">\n <pre>@inproceedings{ Peters_Itti07cvpr,\n author = {R. J. Peters and L. Itti},\n title = {Beyond bottom-up: Incorporating task-dependent influences into a\ncomputational model of spatial attention},\n abstract = {A critical function in both machine vision and biological\n vision systems is attentional selection of scene\n regions worthy of further analysis by higher-level\n processes such as object recognition. Here we\n present the first model of spatial attention that\n (1) can be applied to arbitrary static and dynamic\n image sequences with interactive tasks and (2)\n combines a general computational implementation of\n both bottom-up (BU) saliency and dynamic top-down\n (TD) task relevance; the claimed novelty lies in the\n combination of these elements and in the fully\n computational nature of the model. The BU component\n computes a saliency map from 12 low-level\n multi-scale visual features. The TD component\n computes a low-level signature of the entire image,\n and learns to associate different classes of\n signatures with the different gaze patterns recorded\n from human subjects performing a task of\n interest. We measured the ability of this model to\n predict the eye movements of people playing\n contemporary video games. We found that the TD model\n alone predicts where humans look about twice as well\n as does the BU model alone; in addition, a combined\n BU*TD model performs significantly better than\n either individual component. Qualitatively, the\n combined model predicts some easy-to-describe but\n hard-to-compute aspects of attentional selection,\n such as shifting attention leftward when approaching\n a left turn along a racing track. Thus, our study\n demonstrates the advantages of integrating BU\n factors derived from a saliency map and TD factors\n learned from image and task contexts in predicting\n where humans look while performing complex\n visually-guided behavior.},\n booktitle = {Proc. IEEE Conference on Computer Vision and Pattern\nRecognition (CVPR)},\n address = {Minneapolis, MN},\n month = {Jun},\n year = {2007},\n type = {bu ; cv ; td ; eye ; mod},\n file = {http://ilab.usc.edu/publications/doc/Peters_Itti07cvpr.pdf},\n if = {2007 acceptance rate: 28%},\n review = {full/conf}\n}</pre>\n</div>\n\n\n<div class=\"well well-small bibbase\" id=\"abstract_Peters_Itti07cvpr\"\n style=\"display:none\">\n A critical function in both machine vision and biological vision systems is attentional selection of scene regions worthy of further analysis by higher-level processes such as object recognition. Here we present the first model of spatial attention that (1) can be applied to arbitrary static and dynamic image sequences with interactive tasks and (2) combines a general computational implementation of both bottom-up (BU) saliency and dynamic top-down (TD) task relevance; the claimed novelty lies in the combination of these elements and in the fully computational nature of the model. The BU component computes a saliency map from 12 low-level multi-scale visual features. The TD component computes a low-level signature of the entire image, and learns to associate different classes of signatures with the different gaze patterns recorded from human subjects performing a task of interest. We measured the ability of this model to predict the eye movements of people playing contemporary video games. We found that the TD model alone predicts where humans look about twice as well as does the BU model alone; in addition, a combined BU*TD model performs significantly better than either individual component. Qualitatively, the combined model predicts some easy-to-describe but hard-to-compute aspects of attentional selection, such as shifting attention leftward when approaching a left turn along a racing track. Thus, our study demonstrates the advantages of integrating BU factors derived from a saliency map and TD factors learned from image and task contexts in predicting where humans look while performing complex visually-guided behavior.\n</div>\n\n\n</div>\n","downloads":0,"bibbaseid":"peters-itti-beyondbottomupincorporatingtaskdependentinfluencesintoacomputationalmodelofspatialattention-2007","role":"author","year":"2007","type":"bu ; cv ; td ; eye ; mod","title":"Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention","review":"full/conf","month":"Jun","key":"Peters_Itti07cvpr","if":"2007 acceptance rate: 28%","id":"Peters_Itti07cvpr","file":"http://ilab.usc.edu/publications/doc/Peters_Itti07cvpr.pdf","booktitle":"Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","bibtype":"inproceedings","bibtex":"@inproceedings{ Peters_Itti07cvpr,\n author = {R. J. Peters and L. Itti},\n title = {Beyond bottom-up: Incorporating task-dependent influences into a\ncomputational model of spatial attention},\n abstract = {A critical function in both machine vision and biological\n vision systems is attentional selection of scene\n regions worthy of further analysis by higher-level\n processes such as object recognition. Here we\n present the first model of spatial attention that\n (1) can be applied to arbitrary static and dynamic\n image sequences with interactive tasks and (2)\n combines a general computational implementation of\n both bottom-up (BU) saliency and dynamic top-down\n (TD) task relevance; the claimed novelty lies in the\n combination of these elements and in the fully\n computational nature of the model. The BU component\n computes a saliency map from 12 low-level\n multi-scale visual features. The TD component\n computes a low-level signature of the entire image,\n and learns to associate different classes of\n signatures with the different gaze patterns recorded\n from human subjects performing a task of\n interest. We measured the ability of this model to\n predict the eye movements of people playing\n contemporary video games. We found that the TD model\n alone predicts where humans look about twice as well\n as does the BU model alone; in addition, a combined\n BU*TD model performs significantly better than\n either individual component. Qualitatively, the\n combined model predicts some easy-to-describe but\n hard-to-compute aspects of attentional selection,\n such as shifting attention leftward when approaching\n a left turn along a racing track. Thus, our study\n demonstrates the advantages of integrating BU\n factors derived from a saliency map and TD factors\n learned from image and task contexts in predicting\n where humans look while performing complex\n visually-guided behavior.},\n booktitle = {Proc. IEEE Conference on Computer Vision and Pattern\nRecognition (CVPR)},\n address = {Minneapolis, MN},\n month = {Jun},\n year = {2007},\n type = {bu ; cv ; td ; eye ; mod},\n file = {http://ilab.usc.edu/publications/doc/Peters_Itti07cvpr.pdf},\n if = {2007 acceptance rate: 28%},\n review = {full/conf}\n}","author_short":["Peters, R.<nbsp>J.","Itti, L."],"author":["Peters, R. J.","Itti, L."],"address":"Minneapolis, MN","abstract":"A critical function in both machine vision and biological vision systems is attentional selection of scene regions worthy of further analysis by higher-level processes such as object recognition. Here we present the first model of spatial attention that (1) can be applied to arbitrary static and dynamic image sequences with interactive tasks and (2) combines a general computational implementation of both bottom-up (BU) saliency and dynamic top-down (TD) task relevance; the claimed novelty lies in the combination of these elements and in the fully computational nature of the model. The BU component computes a saliency map from 12 low-level multi-scale visual features. The TD component computes a low-level signature of the entire image, and learns to associate different classes of signatures with the different gaze patterns recorded from human subjects performing a task of interest. We measured the ability of this model to predict the eye movements of people playing contemporary video games. We found that the TD model alone predicts where humans look about twice as well as does the BU model alone; in addition, a combined BU*TD model performs significantly better than either individual component. Qualitatively, the combined model predicts some easy-to-describe but hard-to-compute aspects of attentional selection, such as shifting attention leftward when approaching a left turn along a racing track. Thus, our study demonstrates the advantages of integrating BU factors derived from a saliency map and TD factors learned from image and task contexts in predicting where humans look while performing complex visually-guided behavior."},"bibtype":"inproceedings","biburl":"http://ilab.usc.edu/publications/src/ilab.bib","downloads":0,"search_terms":["beyond","bottom","incorporating","task","dependent","influences","computational","model","spatial","attention","peters","itti"],"title":"Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention","year":2007,"dataSources":["wedBDxEpNXNCLZ2sZ"]}