Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. January, 2023. arXiv:2201.11903 [cs]

Paper doi abstract bibtex

We explore how generating a chain of thought – a series of intermediate reasoning steps – significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.

@misc{wei2023ChainofThought,
	title = {Chain-of-{Thought} {Prompting} {Elicits} {Reasoning} in {Large} {Language} {Models}},
	url = {http://arxiv.org/abs/2201.11903},
	doi = {10.48550/arXiv.2201.11903},
	abstract = {We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.},
	urldate = {2026-03-03},
	publisher = {arXiv},
	author = {Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and Ichter, Brian and Xia, Fei and Chi, Ed and Le, Quoc and Zhou, Denny},
	month = jan,
	year = {2023},
	note = {arXiv:2201.11903 [cs]},
	keywords = {Reasoning},
}

Downloads: 0

{"_id":"FE6TS4NHvC7hjRp4M","bibbaseid":"wei-wang-schuurmans-bosma-ichter-xia-chi-le-etal-chainofthoughtpromptingelicitsreasoninginlargelanguagemodels-2023","author_short":["Wei, J.","Wang, X.","Schuurmans, D.","Bosma, M.","Ichter, B.","Xia, F.","Chi, E.","Le, Q.","Zhou, D."],"bibdata":{"bibtype":"misc","type":"misc","title":"Chain-of-Thought Prompting Elicits Reasoning in Large Language Models","url":"http://arxiv.org/abs/2201.11903","doi":"10.48550/arXiv.2201.11903","abstract":"We explore how generating a chain of thought – a series of intermediate reasoning steps – significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.","urldate":"2026-03-03","publisher":"arXiv","author":[{"propositions":[],"lastnames":["Wei"],"firstnames":["Jason"],"suffixes":[]},{"propositions":[],"lastnames":["Wang"],"firstnames":["Xuezhi"],"suffixes":[]},{"propositions":[],"lastnames":["Schuurmans"],"firstnames":["Dale"],"suffixes":[]},{"propositions":[],"lastnames":["Bosma"],"firstnames":["Maarten"],"suffixes":[]},{"propositions":[],"lastnames":["Ichter"],"firstnames":["Brian"],"suffixes":[]},{"propositions":[],"lastnames":["Xia"],"firstnames":["Fei"],"suffixes":[]},{"propositions":[],"lastnames":["Chi"],"firstnames":["Ed"],"suffixes":[]},{"propositions":[],"lastnames":["Le"],"firstnames":["Quoc"],"suffixes":[]},{"propositions":[],"lastnames":["Zhou"],"firstnames":["Denny"],"suffixes":[]}],"month":"January","year":"2023","note":"arXiv:2201.11903 [cs]","keywords":"Reasoning","bibtex":"@misc{wei2023ChainofThought,\n\ttitle = {Chain-of-{Thought} {Prompting} {Elicits} {Reasoning} in {Large} {Language} {Models}},\n\turl = {http://arxiv.org/abs/2201.11903},\n\tdoi = {10.48550/arXiv.2201.11903},\n\tabstract = {We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.},\n\turldate = {2026-03-03},\n\tpublisher = {arXiv},\n\tauthor = {Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and Ichter, Brian and Xia, Fei and Chi, Ed and Le, Quoc and Zhou, Denny},\n\tmonth = jan,\n\tyear = {2023},\n\tnote = {arXiv:2201.11903 [cs]},\n\tkeywords = {Reasoning},\n}\n\n","author_short":["Wei, J.","Wang, X.","Schuurmans, D.","Bosma, M.","Ichter, B.","Xia, F.","Chi, E.","Le, Q.","Zhou, D."],"key":"wei2023ChainofThought","id":"wei2023ChainofThought","bibbaseid":"wei-wang-schuurmans-bosma-ichter-xia-chi-le-etal-chainofthoughtpromptingelicitsreasoninginlargelanguagemodels-2023","role":"author","urls":{"Paper":"http://arxiv.org/abs/2201.11903"},"keyword":["Reasoning"],"metadata":{"authorlinks":{}}},"bibtype":"misc","biburl":"https://api.zotero.org/users/4032374/collections/4UCJZAVL/items?key=5f7T4OfDqkAYW22yql2BxO5c&format=bibtex&limit=100","dataSources":["h7kKWXpJh2iaX92T5","h7GBqZX35Zw97tcZk"],"keywords":["reasoning"],"search_terms":["chain","thought","prompting","elicits","reasoning","large","language","models","wei","wang","schuurmans","bosma","ichter","xia","chi","le","zhou"],"title":"Chain-of-Thought Prompting Elicits Reasoning in Large Language Models","year":2023}