A Pilot Evaluation of ChatGPT and DALL-E 2 on Decision Making and Spatial Reasoning. Tang, Z. & Kejriwal, M. arXiv.org, February, 2023. Place: Ithaca Publisher: Cornell University Library, arXiv.org
A Pilot Evaluation of ChatGPT and DALL-E 2 on Decision Making and Spatial Reasoning [link]Paper  abstract   bibtex   
We conduct a pilot study selectively evaluating the cognitive abilities (decision making and spatial reasoning) of two recently released generative transformer models, ChatGPT and DALL-E 2. Input prompts were constructed following neutral a priori guidelines, rather than adversarial intent. Post hoc qualitative analysis of the outputs shows that DALL-E 2 is able to generate at least one correct image for each spatial reasoning prompt, but most images generated are incorrect (even though the model seems to have a clear understanding of the objects mentioned in the prompt). Similarly, in evaluating ChatGPT on the rationality axioms developed under the classical Von Neumann-Morgenstern utility theorem, we find that, although it demonstrates some level of rational decision-making, many of its decisions violate at least one of the axioms even under reasonable constructions of preferences, bets, and decision-making prompts. ChatGPT's outputs on such problems generally tended to be unpredictable: even as it made irrational decisions (or employed an incorrect reasoning process) for some simpler decision-making problems, it was able to draw correct conclusions for more complex bet structures. We briefly comment on the nuances and challenges involved in scaling up such a 'cognitive' evaluation or conducting it with a closed set of answer keys ('ground truth'), given that these models are inherently generative and open-ended in responding to prompts.
@article{tang_pilot_2023,
	title = {A {Pilot} {Evaluation} of {ChatGPT} and {DALL}-{E} 2 on {Decision} {Making} and {Spatial} {Reasoning}},
	url = {https://www.proquest.com/working-papers/pilot-evaluation-chatgpt-dall-e-2-on-decision/docview/2778490452/se-2},
	abstract = {We conduct a pilot study selectively evaluating the cognitive abilities (decision making and spatial reasoning) of two recently released generative transformer models, ChatGPT and DALL-E 2. Input prompts were constructed following neutral a priori guidelines, rather than adversarial intent. Post hoc qualitative analysis of the outputs shows that DALL-E 2 is able to generate at least one correct image for each spatial reasoning prompt, but most images generated are incorrect (even though the model seems to have a clear understanding of the objects mentioned in the prompt). Similarly, in evaluating ChatGPT on the rationality axioms developed under the classical Von Neumann-Morgenstern utility theorem, we find that, although it demonstrates some level of rational decision-making, many of its decisions violate at least one of the axioms even under reasonable constructions of preferences, bets, and decision-making prompts. ChatGPT's outputs on such problems generally tended to be unpredictable: even as it made irrational decisions (or employed an incorrect reasoning process) for some simpler decision-making problems, it was able to draw correct conclusions for more complex bet structures. We briefly comment on the nuances and challenges involved in scaling up such a 'cognitive' evaluation or conducting it with a closed set of answer keys ('ground truth'), given that these models are inherently generative and open-ended in responding to prompts.},
	language = {English},
	journal = {arXiv.org},
	author = {Tang, Zhisheng and Kejriwal, Mayank},
	month = feb,
	year = {2023},
	note = {Place: Ithaca
Publisher: Cornell University Library, arXiv.org},
	keywords = {Artificial intelligence, Reasoning, Chatbots, Artificial Intelligence, Business And Economics--Banking And Finance, Computer Vision and Pattern Recognition, Computation and Language, Cognition \& reasoning, Qualitative analysis, Decision making, Axioms, Cognitive ability},
	annote = {Copyright - © 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”).  Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.},
	annote = {Última actualización - 2023-02-22},
}

Downloads: 0