Mechanisms of Prompt-Induced Hallucination in Vision-Language Models

Mechanisms of Prompt-Induced Hallucination in Vision-Language Models. Rudman, W., Golovanevsky, M., Arad, D., Belinkov, Y., Singh, R., Eickhoff, C., & Mahowald, K. April, 2026. arXiv:2601.05201 [cs]

Paper doi abstract bibtex

Large vision-language models (VLMs) are highly capable, yet often hallucinate by favoring textual prompts over visual evidence. We study this failure mode in a controlled object-counting setting, where the prompt overstates the number of objects in the image (e.g., asking a model to describe four waterlilies when only three are present). At low object counts, models often correct the overestimation, but as the number of objects increases, they increasingly conform to the prompt regardless of the discrepancy. Through mechanistic analysis of three VLMs, we identify a small set of attention heads whose ablation substantially reduces prompt-induced hallucinations (PIH) by at least 40% without additional training. Across models, PIH-heads mediate prompt copying in model-specific ways. We characterize these differences and show that PIH ablation increases correction toward visual evidence. Our findings offer insights into the internal mechanisms driving prompt-induced hallucinations, revealing model-specific differences in how these behaviors are implemented.

@misc{rudman_mechanisms_2026,
	title = {Mechanisms of {Prompt}-{Induced} {Hallucination} in {Vision}-{Language} {Models}},
	url = {http://arxiv.org/abs/2601.05201},
	doi = {10.48550/arXiv.2601.05201},
	abstract = {Large vision-language models (VLMs) are highly capable, yet often hallucinate by favoring textual prompts over visual evidence. We study this failure mode in a controlled object-counting setting, where the prompt overstates the number of objects in the image (e.g., asking a model to describe four waterlilies when only three are present). At low object counts, models often correct the overestimation, but as the number of objects increases, they increasingly conform to the prompt regardless of the discrepancy. Through mechanistic analysis of three VLMs, we identify a small set of attention heads whose ablation substantially reduces prompt-induced hallucinations (PIH) by at least 40\% without additional training. Across models, PIH-heads mediate prompt copying in model-specific ways. We characterize these differences and show that PIH ablation increases correction toward visual evidence. Our findings offer insights into the internal mechanisms driving prompt-induced hallucinations, revealing model-specific differences in how these behaviors are implemented.},
	language = {en},
	urldate = {2026-04-20},
	publisher = {arXiv},
	author = {Rudman, William and Golovanevsky, Michal and Arad, Dana and Belinkov, Yonatan and Singh, Ritambhara and Eickhoff, Carsten and Mahowald, Kyle},
	month = apr,
	year = {2026},
	note = {arXiv:2601.05201 [cs]},
	keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, SYS: CosmicAI Contact Author, WG: Explorable},
}

Downloads: 0

{"_id":"XitFHYgqjAZJmQRwZ","bibbaseid":"rudman-golovanevsky-arad-belinkov-singh-eickhoff-mahowald-mechanismsofpromptinducedhallucinationinvisionlanguagemodels-2026","author_short":["Rudman, W.","Golovanevsky, M.","Arad, D.","Belinkov, Y.","Singh, R.","Eickhoff, C.","Mahowald, K."],"bibdata":{"bibtype":"misc","type":"misc","title":"Mechanisms of Prompt-Induced Hallucination in Vision-Language Models","url":"http://arxiv.org/abs/2601.05201","doi":"10.48550/arXiv.2601.05201","abstract":"Large vision-language models (VLMs) are highly capable, yet often hallucinate by favoring textual prompts over visual evidence. We study this failure mode in a controlled object-counting setting, where the prompt overstates the number of objects in the image (e.g., asking a model to describe four waterlilies when only three are present). At low object counts, models often correct the overestimation, but as the number of objects increases, they increasingly conform to the prompt regardless of the discrepancy. Through mechanistic analysis of three VLMs, we identify a small set of attention heads whose ablation substantially reduces prompt-induced hallucinations (PIH) by at least 40% without additional training. Across models, PIH-heads mediate prompt copying in model-specific ways. We characterize these differences and show that PIH ablation increases correction toward visual evidence. Our findings offer insights into the internal mechanisms driving prompt-induced hallucinations, revealing model-specific differences in how these behaviors are implemented.","language":"en","urldate":"2026-04-20","publisher":"arXiv","author":[{"propositions":[],"lastnames":["Rudman"],"firstnames":["William"],"suffixes":[]},{"propositions":[],"lastnames":["Golovanevsky"],"firstnames":["Michal"],"suffixes":[]},{"propositions":[],"lastnames":["Arad"],"firstnames":["Dana"],"suffixes":[]},{"propositions":[],"lastnames":["Belinkov"],"firstnames":["Yonatan"],"suffixes":[]},{"propositions":[],"lastnames":["Singh"],"firstnames":["Ritambhara"],"suffixes":[]},{"propositions":[],"lastnames":["Eickhoff"],"firstnames":["Carsten"],"suffixes":[]},{"propositions":[],"lastnames":["Mahowald"],"firstnames":["Kyle"],"suffixes":[]}],"month":"April","year":"2026","note":"arXiv:2601.05201 [cs]","keywords":"Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, SYS: CosmicAI Contact Author, WG: Explorable","bibtex":"@misc{rudman_mechanisms_2026,\n\ttitle = {Mechanisms of {Prompt}-{Induced} {Hallucination} in {Vision}-{Language} {Models}},\n\turl = {http://arxiv.org/abs/2601.05201},\n\tdoi = {10.48550/arXiv.2601.05201},\n\tabstract = {Large vision-language models (VLMs) are highly capable, yet often hallucinate by favoring textual prompts over visual evidence. We study this failure mode in a controlled object-counting setting, where the prompt overstates the number of objects in the image (e.g., asking a model to describe four waterlilies when only three are present). At low object counts, models often correct the overestimation, but as the number of objects increases, they increasingly conform to the prompt regardless of the discrepancy. Through mechanistic analysis of three VLMs, we identify a small set of attention heads whose ablation substantially reduces prompt-induced hallucinations (PIH) by at least 40\\% without additional training. Across models, PIH-heads mediate prompt copying in model-specific ways. We characterize these differences and show that PIH ablation increases correction toward visual evidence. Our findings offer insights into the internal mechanisms driving prompt-induced hallucinations, revealing model-specific differences in how these behaviors are implemented.},\n\tlanguage = {en},\n\turldate = {2026-04-20},\n\tpublisher = {arXiv},\n\tauthor = {Rudman, William and Golovanevsky, Michal and Arad, Dana and Belinkov, Yonatan and Singh, Ritambhara and Eickhoff, Carsten and Mahowald, Kyle},\n\tmonth = apr,\n\tyear = {2026},\n\tnote = {arXiv:2601.05201 [cs]},\n\tkeywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, SYS: CosmicAI Contact Author, WG: Explorable},\n}\n\n\n\n","author_short":["Rudman, W.","Golovanevsky, M.","Arad, D.","Belinkov, Y.","Singh, R.","Eickhoff, C.","Mahowald, K."],"key":"rudman_mechanisms_2026","id":"rudman_mechanisms_2026","bibbaseid":"rudman-golovanevsky-arad-belinkov-singh-eickhoff-mahowald-mechanismsofpromptinducedhallucinationinvisionlanguagemodels-2026","role":"author","urls":{"Paper":"http://arxiv.org/abs/2601.05201"},"keyword":["Computer Science - Artificial Intelligence","Computer Science - Computation and Language","Computer Science - Computer Vision and Pattern Recognition","SYS: CosmicAI Contact Author","WG: Explorable"],"metadata":{"authorlinks":{}},"downloads":0},"bibtype":"misc","biburl":"https://bibbase.org/zotero-group/pratikmhatre/5933976","dataSources":["yJr5AAtJ5Sz3Q4WT4"],"keywords":["computer science - artificial intelligence","computer science - computation and language","computer science - computer vision and pattern recognition","sys: cosmicai contact author","wg: explorable"],"search_terms":["mechanisms","prompt","induced","hallucination","vision","language","models","rudman","golovanevsky","arad","belinkov","singh","eickhoff","mahowald"],"title":"Mechanisms of Prompt-Induced Hallucination in Vision-Language Models","year":2026}