A Pilot Evaluation of ChatGPT and DALL-E 2 on Decision Making and Spatial Reasoning. Tang, Z. & Kejriwal, M. arXiv.org, February, 2023. Place: Ithaca Publisher: Cornell University Library, arXiv.org
Paper abstract bibtex We conduct a pilot study selectively evaluating the cognitive abilities (decision making and spatial reasoning) of two recently released generative transformer models, ChatGPT and DALL-E 2. Input prompts were constructed following neutral a priori guidelines, rather than adversarial intent. Post hoc qualitative analysis of the outputs shows that DALL-E 2 is able to generate at least one correct image for each spatial reasoning prompt, but most images generated are incorrect (even though the model seems to have a clear understanding of the objects mentioned in the prompt). Similarly, in evaluating ChatGPT on the rationality axioms developed under the classical Von Neumann-Morgenstern utility theorem, we find that, although it demonstrates some level of rational decision-making, many of its decisions violate at least one of the axioms even under reasonable constructions of preferences, bets, and decision-making prompts. ChatGPT's outputs on such problems generally tended to be unpredictable: even as it made irrational decisions (or employed an incorrect reasoning process) for some simpler decision-making problems, it was able to draw correct conclusions for more complex bet structures. We briefly comment on the nuances and challenges involved in scaling up such a 'cognitive' evaluation or conducting it with a closed set of answer keys ('ground truth'), given that these models are inherently generative and open-ended in responding to prompts.
@article{tang_pilot_2023,
title = {A {Pilot} {Evaluation} of {ChatGPT} and {DALL}-{E} 2 on {Decision} {Making} and {Spatial} {Reasoning}},
url = {https://www.proquest.com/working-papers/pilot-evaluation-chatgpt-dall-e-2-on-decision/docview/2778490452/se-2},
abstract = {We conduct a pilot study selectively evaluating the cognitive abilities (decision making and spatial reasoning) of two recently released generative transformer models, ChatGPT and DALL-E 2. Input prompts were constructed following neutral a priori guidelines, rather than adversarial intent. Post hoc qualitative analysis of the outputs shows that DALL-E 2 is able to generate at least one correct image for each spatial reasoning prompt, but most images generated are incorrect (even though the model seems to have a clear understanding of the objects mentioned in the prompt). Similarly, in evaluating ChatGPT on the rationality axioms developed under the classical Von Neumann-Morgenstern utility theorem, we find that, although it demonstrates some level of rational decision-making, many of its decisions violate at least one of the axioms even under reasonable constructions of preferences, bets, and decision-making prompts. ChatGPT's outputs on such problems generally tended to be unpredictable: even as it made irrational decisions (or employed an incorrect reasoning process) for some simpler decision-making problems, it was able to draw correct conclusions for more complex bet structures. We briefly comment on the nuances and challenges involved in scaling up such a 'cognitive' evaluation or conducting it with a closed set of answer keys ('ground truth'), given that these models are inherently generative and open-ended in responding to prompts.},
language = {English},
journal = {arXiv.org},
author = {Tang, Zhisheng and Kejriwal, Mayank},
month = feb,
year = {2023},
note = {Place: Ithaca
Publisher: Cornell University Library, arXiv.org},
keywords = {Artificial intelligence, Reasoning, Chatbots, Artificial Intelligence, Business And Economics--Banking And Finance, Computer Vision and Pattern Recognition, Computation and Language, Cognition \& reasoning, Qualitative analysis, Decision making, Axioms, Cognitive ability},
annote = {Copyright - © 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.},
annote = {Última actualización - 2023-02-22},
}
Downloads: 0
{"_id":"56gnbRhDKbLZr6PQj","bibbaseid":"tang-kejriwal-apilotevaluationofchatgptanddalle2ondecisionmakingandspatialreasoning-2023","author_short":["Tang, Z.","Kejriwal, M."],"bibdata":{"bibtype":"article","type":"article","title":"A Pilot Evaluation of ChatGPT and DALL-E 2 on Decision Making and Spatial Reasoning","url":"https://www.proquest.com/working-papers/pilot-evaluation-chatgpt-dall-e-2-on-decision/docview/2778490452/se-2","abstract":"We conduct a pilot study selectively evaluating the cognitive abilities (decision making and spatial reasoning) of two recently released generative transformer models, ChatGPT and DALL-E 2. Input prompts were constructed following neutral a priori guidelines, rather than adversarial intent. Post hoc qualitative analysis of the outputs shows that DALL-E 2 is able to generate at least one correct image for each spatial reasoning prompt, but most images generated are incorrect (even though the model seems to have a clear understanding of the objects mentioned in the prompt). Similarly, in evaluating ChatGPT on the rationality axioms developed under the classical Von Neumann-Morgenstern utility theorem, we find that, although it demonstrates some level of rational decision-making, many of its decisions violate at least one of the axioms even under reasonable constructions of preferences, bets, and decision-making prompts. ChatGPT's outputs on such problems generally tended to be unpredictable: even as it made irrational decisions (or employed an incorrect reasoning process) for some simpler decision-making problems, it was able to draw correct conclusions for more complex bet structures. We briefly comment on the nuances and challenges involved in scaling up such a 'cognitive' evaluation or conducting it with a closed set of answer keys ('ground truth'), given that these models are inherently generative and open-ended in responding to prompts.","language":"English","journal":"arXiv.org","author":[{"propositions":[],"lastnames":["Tang"],"firstnames":["Zhisheng"],"suffixes":[]},{"propositions":[],"lastnames":["Kejriwal"],"firstnames":["Mayank"],"suffixes":[]}],"month":"February","year":"2023","note":"Place: Ithaca Publisher: Cornell University Library, arXiv.org","keywords":"Artificial intelligence, Reasoning, Chatbots, Artificial Intelligence, Business And Economics–Banking And Finance, Computer Vision and Pattern Recognition, Computation and Language, Cognition & reasoning, Qualitative analysis, Decision making, Axioms, Cognitive ability","annote":"Última actualización - 2023-02-22","bibtex":"@article{tang_pilot_2023,\n\ttitle = {A {Pilot} {Evaluation} of {ChatGPT} and {DALL}-{E} 2 on {Decision} {Making} and {Spatial} {Reasoning}},\n\turl = {https://www.proquest.com/working-papers/pilot-evaluation-chatgpt-dall-e-2-on-decision/docview/2778490452/se-2},\n\tabstract = {We conduct a pilot study selectively evaluating the cognitive abilities (decision making and spatial reasoning) of two recently released generative transformer models, ChatGPT and DALL-E 2. Input prompts were constructed following neutral a priori guidelines, rather than adversarial intent. Post hoc qualitative analysis of the outputs shows that DALL-E 2 is able to generate at least one correct image for each spatial reasoning prompt, but most images generated are incorrect (even though the model seems to have a clear understanding of the objects mentioned in the prompt). Similarly, in evaluating ChatGPT on the rationality axioms developed under the classical Von Neumann-Morgenstern utility theorem, we find that, although it demonstrates some level of rational decision-making, many of its decisions violate at least one of the axioms even under reasonable constructions of preferences, bets, and decision-making prompts. ChatGPT's outputs on such problems generally tended to be unpredictable: even as it made irrational decisions (or employed an incorrect reasoning process) for some simpler decision-making problems, it was able to draw correct conclusions for more complex bet structures. We briefly comment on the nuances and challenges involved in scaling up such a 'cognitive' evaluation or conducting it with a closed set of answer keys ('ground truth'), given that these models are inherently generative and open-ended in responding to prompts.},\n\tlanguage = {English},\n\tjournal = {arXiv.org},\n\tauthor = {Tang, Zhisheng and Kejriwal, Mayank},\n\tmonth = feb,\n\tyear = {2023},\n\tnote = {Place: Ithaca\nPublisher: Cornell University Library, arXiv.org},\n\tkeywords = {Artificial intelligence, Reasoning, Chatbots, Artificial Intelligence, Business And Economics--Banking And Finance, Computer Vision and Pattern Recognition, Computation and Language, Cognition \\& reasoning, Qualitative analysis, Decision making, Axioms, Cognitive ability},\n\tannote = {Copyright - © 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.},\n\tannote = {Última actualización - 2023-02-22},\n}\n\n","author_short":["Tang, Z.","Kejriwal, M."],"key":"tang_pilot_2023","id":"tang_pilot_2023","bibbaseid":"tang-kejriwal-apilotevaluationofchatgptanddalle2ondecisionmakingandspatialreasoning-2023","role":"author","urls":{"Paper":"https://www.proquest.com/working-papers/pilot-evaluation-chatgpt-dall-e-2-on-decision/docview/2778490452/se-2"},"keyword":["Artificial intelligence","Reasoning","Chatbots","Artificial Intelligence","Business And Economics–Banking And Finance","Computer Vision and Pattern Recognition","Computation and Language","Cognition & reasoning","Qualitative analysis","Decision making","Axioms","Cognitive ability"],"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://bibbase.org/network/files/22WYpzbBvi3hDHX7Y","dataSources":["cYu6uhMkeFHgRrEty","hLMh7bwHyFsPNWAEL","LKW3iRvnztCpLNTW7","TLD9JxqHfSQQ4r268","X9BvByJrC3kGJexn8","iovNvcnNYDGJcuMq2","NjZJ5ZmWhTtMZBfje","SjrwGAA7ah7PjkNNm","E7HtrXAfg4zKQPPmh","hHm68SQS8MggmNLuN"],"keywords":["artificial intelligence","reasoning","chatbots","artificial intelligence","business and economics–banking and finance","computer vision and pattern recognition","computation and language","cognition & reasoning","qualitative analysis","decision making","axioms","cognitive ability"],"search_terms":["pilot","evaluation","chatgpt","dall","decision","making","spatial","reasoning","tang","kejriwal"],"title":"A Pilot Evaluation of ChatGPT and DALL-E 2 on Decision Making and Spatial Reasoning","year":2023}