Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study

Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study. Borji, A., Sihite, D. N., & Itti, L. IEEE Transactions on Image Processing, 2012.
abstract bibtex

Visual attention is a process that enables biological and machine vision systems to select the most relevant regions from a scene. Relevance is determined by two components: 1) top-down factors driven by task and 2) bottom-up factors that highlight image regions that are different from their surroundings. The latter are often referred to as 'visual saliency'. Modeling bottom-up visual saliency has been the subject of numerous research efforts during the past 20 years, with many successful applications in computer vision and robotics. Available models have been tested with different datasets (e.g., synthetic psychological search arrays, natural images or videos) using different evaluation scores (e.g., search slopes, comparison to human eye tracking) and parameter settings. This has made direct comparison of models difficult. Here we perform an exhaustive comparison of 35 state-of-the-art saliency models over 54 challenging synthetic patterns, 3 natural image datasets, and 2 video datasets, using 3 evaluation scores. We find that although model rankings vary, some models consistently perform better. Analysis of datasets reveals that existing datasets are highly center-biased, which influences some of the evaluation scores. Computational complexity analysis shows that some models are very fast, yet yield competitive eye movement prediction accuracy. Different models often have common easy/difficult stimuli. Furthermore, several concerns in visual saliency modeling, eye movement datasets, and evaluation scores are discussed and insights for future work are provided. Our study allows one to assess the state-of-the-art, helps organizing this rapidly growing field, and sets a unified comparison framework for gauging future efforts, similar to the PASCAL VOC challenge in the object recognition and detection domains.

@article{ Borji_etal12tip,
author = {A. Borji and D. N. Sihite and L. Itti},
title = {Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study},
journal = {IEEE Transactions on Image Processing},
abstract = {Visual attention is a process that enables biological and machine vision systems to select the most relevant
regions from a scene. Relevance is determined by two components: 1) top-down factors driven by task
and 2) bottom-up factors that highlight image regions that are different from their surroundings. The
latter are often referred to as 'visual saliency'. Modeling bottom-up visual saliency has been the
subject of numerous research efforts during the past 20 years, with many successful applications in
computer vision and robotics. Available models have been tested with different datasets (e.g.,
synthetic psychological search arrays, natural images or videos) using different evaluation scores
(e.g., search slopes, comparison to human eye tracking) and parameter settings. This has made direct
comparison of models difficult. Here we perform an exhaustive comparison of 35 state-of-the-art
saliency models over 54 challenging synthetic patterns, 3 natural image datasets, and 2 video
datasets, using 3 evaluation scores. We find that although model rankings vary, some models
consistently perform better. Analysis of datasets reveals that existing datasets are highly
center-biased, which influences some of the evaluation scores. Computational complexity analysis
shows that some models are very fast, yet yield competitive eye movement prediction
accuracy. Different models often have common easy/difficult stimuli. Furthermore, several concerns in
visual saliency modeling, eye movement datasets, and evaluation scores are discussed and insights for
future work are provided. Our study allows one to assess the state-of-the-art, helps organizing this
rapidly growing field, and sets a unified comparison framework for gauging future efforts, similar to
the PASCAL VOC challenge in the object recognition and detection domains.},
pages = {1-16 (in press)},
year = {2012},
type = {mod;bu},
if = {2011 Impact Factor: 3.042},
file = {http://ilab.usc.edu/publications/doc/Borji_etal12tip.pdf}
}

Downloads: 0

{"_id":{"_str":"5298a1a09eb585cc2600086c"},"__v":0,"authorIDs":[],"author_short":["Borji, A.","Sihite, D.<nbsp>N.","Itti, L."],"bibbaseid":"borji-sihite-itti-quantitativeanalysisofhumanmodelagreementinvisualsaliencymodelingacomparativestudy-2012","bibdata":{"html":"<div class=\"bibbase_paper\"> \n\n\n<span class=\"bibbase_paper_titleauthoryear\">\n\t<span class=\"bibbase_paper_title\"><a name=\"Borji_etal12tip\"> </a>Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study.</span>\n\t<span class=\"bibbase_paper_author\">\nBorji, A.; Sihite, D. N.; and Itti, L.</span>\n\t\n</span>\n\n\n\n<i>IEEE Transactions on Image Processing</i>,\n\n1-16 (in press).\n\n 2012.\n\n\n\n\n<br class=\"bibbase_paper_content\"/>\n\n<span class=\"bibbase_paper_content\">\n \n \n \n <a href=\"javascript:showBib('Borji_etal12tip')\"\n class=\"bibbase link\">\n \n\t\n\t\n\t\n BibTeX\n <i class=\"fa fa-caret-down\"></i></a>\n \n \n  \n <a class=\"bibbase_abstract_link bibbase link\"\n href=\"javascript:showAbstract('Borji_etal12tip')\">\n Abstract\n <i class=\"fa fa-caret-down\"></i></a>\n \n \n \n\n \n \n \n</span>\n\n<div class=\"well well-small bibbase\" id=\"bib_Borji_etal12tip\"\n style=\"display:none\">\n <pre>@article{ Borji_etal12tip,\n author = {A. Borji and D. N. Sihite and L. Itti},\n title = {Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study},\n journal = {IEEE Transactions on Image Processing},\n abstract = {Visual attention is a process that enables biological and machine vision systems to select the most relevant\n regions from a scene. Relevance is determined by two components: 1) top-down factors driven by task\n and 2) bottom-up factors that highlight image regions that are different from their surroundings. The\n latter are often referred to as 'visual saliency'. Modeling bottom-up visual saliency has been the\n subject of numerous research efforts during the past 20 years, with many successful applications in\n computer vision and robotics. Available models have been tested with different datasets (e.g.,\n synthetic psychological search arrays, natural images or videos) using different evaluation scores\n (e.g., search slopes, comparison to human eye tracking) and parameter settings. This has made direct\n comparison of models difficult. Here we perform an exhaustive comparison of 35 state-of-the-art\n saliency models over 54 challenging synthetic patterns, 3 natural image datasets, and 2 video\n datasets, using 3 evaluation scores. We find that although model rankings vary, some models\n consistently perform better. Analysis of datasets reveals that existing datasets are highly\n center-biased, which influences some of the evaluation scores. Computational complexity analysis\n shows that some models are very fast, yet yield competitive eye movement prediction\n accuracy. Different models often have common easy/difficult stimuli. Furthermore, several concerns in\n visual saliency modeling, eye movement datasets, and evaluation scores are discussed and insights for\n future work are provided. Our study allows one to assess the state-of-the-art, helps organizing this\n rapidly growing field, and sets a unified comparison framework for gauging future efforts, similar to\n the PASCAL VOC challenge in the object recognition and detection domains.},\n pages = {1-16 (in press)},\n year = {2012},\n type = {mod;bu},\n if = {2011 Impact Factor: 3.042},\n file = {http://ilab.usc.edu/publications/doc/Borji_etal12tip.pdf}\n}</pre>\n</div>\n\n\n<div class=\"well well-small bibbase\" id=\"abstract_Borji_etal12tip\"\n style=\"display:none\">\n Visual attention is a process that enables biological and machine vision systems to select the most relevant regions from a scene. Relevance is determined by two components: 1) top-down factors driven by task and 2) bottom-up factors that highlight image regions that are different from their surroundings. The latter are often referred to as 'visual saliency'. Modeling bottom-up visual saliency has been the subject of numerous research efforts during the past 20 years, with many successful applications in computer vision and robotics. Available models have been tested with different datasets (e.g., synthetic psychological search arrays, natural images or videos) using different evaluation scores (e.g., search slopes, comparison to human eye tracking) and parameter settings. This has made direct comparison of models difficult. Here we perform an exhaustive comparison of 35 state-of-the-art saliency models over 54 challenging synthetic patterns, 3 natural image datasets, and 2 video datasets, using 3 evaluation scores. We find that although model rankings vary, some models consistently perform better. Analysis of datasets reveals that existing datasets are highly center-biased, which influences some of the evaluation scores. Computational complexity analysis shows that some models are very fast, yet yield competitive eye movement prediction accuracy. Different models often have common easy/difficult stimuli. Furthermore, several concerns in visual saliency modeling, eye movement datasets, and evaluation scores are discussed and insights for future work are provided. Our study allows one to assess the state-of-the-art, helps organizing this rapidly growing field, and sets a unified comparison framework for gauging future efforts, similar to the PASCAL VOC challenge in the object recognition and detection domains.\n</div>\n\n\n</div>\n","downloads":0,"bibbaseid":"borji-sihite-itti-quantitativeanalysisofhumanmodelagreementinvisualsaliencymodelingacomparativestudy-2012","role":"author","year":"2012","type":"mod;bu","title":"Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study","pages":"1-16 (in press)","key":"Borji_etal12tip","journal":"IEEE Transactions on Image Processing","if":"2011 Impact Factor: 3.042","id":"Borji_etal12tip","file":"http://ilab.usc.edu/publications/doc/Borji_etal12tip.pdf","bibtype":"article","bibtex":"@article{ Borji_etal12tip,\n author = {A. Borji and D. N. Sihite and L. Itti},\n title = {Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study},\n journal = {IEEE Transactions on Image Processing},\n abstract = {Visual attention is a process that enables biological and machine vision systems to select the most relevant\n regions from a scene. Relevance is determined by two components: 1) top-down factors driven by task\n and 2) bottom-up factors that highlight image regions that are different from their surroundings. The\n latter are often referred to as 'visual saliency'. Modeling bottom-up visual saliency has been the\n subject of numerous research efforts during the past 20 years, with many successful applications in\n computer vision and robotics. Available models have been tested with different datasets (e.g.,\n synthetic psychological search arrays, natural images or videos) using different evaluation scores\n (e.g., search slopes, comparison to human eye tracking) and parameter settings. This has made direct\n comparison of models difficult. Here we perform an exhaustive comparison of 35 state-of-the-art\n saliency models over 54 challenging synthetic patterns, 3 natural image datasets, and 2 video\n datasets, using 3 evaluation scores. We find that although model rankings vary, some models\n consistently perform better. Analysis of datasets reveals that existing datasets are highly\n center-biased, which influences some of the evaluation scores. Computational complexity analysis\n shows that some models are very fast, yet yield competitive eye movement prediction\n accuracy. Different models often have common easy/difficult stimuli. Furthermore, several concerns in\n visual saliency modeling, eye movement datasets, and evaluation scores are discussed and insights for\n future work are provided. Our study allows one to assess the state-of-the-art, helps organizing this\n rapidly growing field, and sets a unified comparison framework for gauging future efforts, similar to\n the PASCAL VOC challenge in the object recognition and detection domains.},\n pages = {1-16 (in press)},\n year = {2012},\n type = {mod;bu},\n if = {2011 Impact Factor: 3.042},\n file = {http://ilab.usc.edu/publications/doc/Borji_etal12tip.pdf}\n}","author_short":["Borji, A.","Sihite, D.<nbsp>N.","Itti, L."],"author":["Borji, A.","Sihite, D. N.","Itti, L."],"abstract":"Visual attention is a process that enables biological and machine vision systems to select the most relevant regions from a scene. Relevance is determined by two components: 1) top-down factors driven by task and 2) bottom-up factors that highlight image regions that are different from their surroundings. The latter are often referred to as 'visual saliency'. Modeling bottom-up visual saliency has been the subject of numerous research efforts during the past 20 years, with many successful applications in computer vision and robotics. Available models have been tested with different datasets (e.g., synthetic psychological search arrays, natural images or videos) using different evaluation scores (e.g., search slopes, comparison to human eye tracking) and parameter settings. This has made direct comparison of models difficult. Here we perform an exhaustive comparison of 35 state-of-the-art saliency models over 54 challenging synthetic patterns, 3 natural image datasets, and 2 video datasets, using 3 evaluation scores. We find that although model rankings vary, some models consistently perform better. Analysis of datasets reveals that existing datasets are highly center-biased, which influences some of the evaluation scores. Computational complexity analysis shows that some models are very fast, yet yield competitive eye movement prediction accuracy. Different models often have common easy/difficult stimuli. Furthermore, several concerns in visual saliency modeling, eye movement datasets, and evaluation scores are discussed and insights for future work are provided. Our study allows one to assess the state-of-the-art, helps organizing this rapidly growing field, and sets a unified comparison framework for gauging future efforts, similar to the PASCAL VOC challenge in the object recognition and detection domains."},"bibtype":"article","biburl":"http://ilab.usc.edu/publications/src/ilab.bib","downloads":0,"search_terms":["quantitative","analysis","human","model","agreement","visual","saliency","modeling","comparative","study","borji","sihite","itti"],"title":"Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study","year":2012,"dataSources":["wedBDxEpNXNCLZ2sZ"]}