A Survey of Large Language Models

A Survey of Large Language Models. Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., Liu, P., Nie, J., & Wen, J. September, 2023. arXiv:2303.18223 [cs]

Paper abstract bibtex

Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering of language intelligence by machine. Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable artificial intelligence (AI) algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pretraining Transformer models over large-scale corpora, showing strong capabilities in solving various natural language processing (NLP) tasks. Since the researchers have found that model scaling can lead to an improved model capacity, they further investigate the scaling effect by increasing the parameter scale to an even larger size. Interestingly, when the parameter scale exceeds a certain level, these enlarged language models not only achieve a significant performance improvement, but also exhibit some special abilities (e.g., incontext learning) that are not present in small-scale language models (e.g., BERT). To discriminate the language models in different parameter scales, the research community has coined the term large language models (LLM) for the PLMs of significant size (e.g., containing tens or hundreds of billions of parameters). Recently, the research on LLMs has been largely advanced by both academia and industry, and a remarkable progress is the launch of ChatGPT (a powerful AI chatbot developed based on LLMs), which has attracted widespread attention from society. The technical evolution of LLMs has been making an important impact on the entire AI community, which would revolutionize the way how we develop and use AI algorithms. Considering this rapid technical progress, in this survey, we review the recent advances of LLMs by introducing the background, key findings, and mainstream techniques. In particular, we focus on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation. Furthermore, we also summarize the available resources for developing LLMs and discuss the remaining issues for future directions. This survey provides an up-to-date review of the literature on LLMs, which can be a useful resource for both researchers and engineers.

@misc{zhao_survey_2023,
	title = {A {Survey} of {Large} {Language} {Models}},
	url = {http://arxiv.org/abs/2303.18223},
	abstract = {Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering of language intelligence by machine. Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable artificial intelligence (AI) algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pretraining Transformer models over large-scale corpora, showing strong capabilities in solving various natural language processing (NLP) tasks. Since the researchers have found that model scaling can lead to an improved model capacity, they further investigate the scaling effect by increasing the parameter scale to an even larger size. Interestingly, when the parameter scale exceeds a certain level, these enlarged language models not only achieve a significant performance improvement, but also exhibit some special abilities (e.g., incontext learning) that are not present in small-scale language models (e.g., BERT). To discriminate the language models in different parameter scales, the research community has coined the term large language models (LLM) for the PLMs of significant size (e.g., containing tens or hundreds of billions of parameters). Recently, the research on LLMs has been largely advanced by both academia and industry, and a remarkable progress is the launch of ChatGPT (a powerful AI chatbot developed based on LLMs), which has attracted widespread attention from society. The technical evolution of LLMs has been making an important impact on the entire AI community, which would revolutionize the way how we develop and use AI algorithms. Considering this rapid technical progress, in this survey, we review the recent advances of LLMs by introducing the background, key findings, and mainstream techniques. In particular, we focus on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation. Furthermore, we also summarize the available resources for developing LLMs and discuss the remaining issues for future directions. This survey provides an up-to-date review of the literature on LLMs, which can be a useful resource for both researchers and engineers.},
	language = {en},
	urldate = {2023-09-14},
	publisher = {arXiv},
	author = {Zhao, Wayne Xin and Zhou, Kun and Li, Junyi and Tang, Tianyi and Wang, Xiaolei and Hou, Yupeng and Min, Yingqian and Zhang, Beichen and Zhang, Junjie and Dong, Zican and Du, Yifan and Yang, Chen and Chen, Yushuo and Chen, Zhipeng and Jiang, Jinhao and Ren, Ruiyang and Li, Yifan and Tang, Xinyu and Liu, Zikang and Liu, Peiyu and Nie, Jian-Yun and Wen, Ji-Rong},
	month = sep,
	year = {2023},
	note = {arXiv:2303.18223 [cs]},
	keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language},
}

Downloads: 0

{"_id":"aLvdCkjSbhGBHvitg","bibbaseid":"zhao-zhou-li-tang-wang-hou-min-zhang-etal-asurveyoflargelanguagemodels-2023","author_short":["Zhao, W. X.","Zhou, K.","Li, J.","Tang, T.","Wang, X.","Hou, Y.","Min, Y.","Zhang, B.","Zhang, J.","Dong, Z.","Du, Y.","Yang, C.","Chen, Y.","Chen, Z.","Jiang, J.","Ren, R.","Li, Y.","Tang, X.","Liu, Z.","Liu, P.","Nie, J.","Wen, J."],"bibdata":{"bibtype":"misc","type":"misc","title":"A Survey of Large Language Models","url":"http://arxiv.org/abs/2303.18223","abstract":"Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering of language intelligence by machine. Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable artificial intelligence (AI) algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pretraining Transformer models over large-scale corpora, showing strong capabilities in solving various natural language processing (NLP) tasks. Since the researchers have found that model scaling can lead to an improved model capacity, they further investigate the scaling effect by increasing the parameter scale to an even larger size. Interestingly, when the parameter scale exceeds a certain level, these enlarged language models not only achieve a significant performance improvement, but also exhibit some special abilities (e.g., incontext learning) that are not present in small-scale language models (e.g., BERT). To discriminate the language models in different parameter scales, the research community has coined the term large language models (LLM) for the PLMs of significant size (e.g., containing tens or hundreds of billions of parameters). Recently, the research on LLMs has been largely advanced by both academia and industry, and a remarkable progress is the launch of ChatGPT (a powerful AI chatbot developed based on LLMs), which has attracted widespread attention from society. The technical evolution of LLMs has been making an important impact on the entire AI community, which would revolutionize the way how we develop and use AI algorithms. Considering this rapid technical progress, in this survey, we review the recent advances of LLMs by introducing the background, key findings, and mainstream techniques. In particular, we focus on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation. Furthermore, we also summarize the available resources for developing LLMs and discuss the remaining issues for future directions. This survey provides an up-to-date review of the literature on LLMs, which can be a useful resource for both researchers and engineers.","language":"en","urldate":"2023-09-14","publisher":"arXiv","author":[{"propositions":[],"lastnames":["Zhao"],"firstnames":["Wayne","Xin"],"suffixes":[]},{"propositions":[],"lastnames":["Zhou"],"firstnames":["Kun"],"suffixes":[]},{"propositions":[],"lastnames":["Li"],"firstnames":["Junyi"],"suffixes":[]},{"propositions":[],"lastnames":["Tang"],"firstnames":["Tianyi"],"suffixes":[]},{"propositions":[],"lastnames":["Wang"],"firstnames":["Xiaolei"],"suffixes":[]},{"propositions":[],"lastnames":["Hou"],"firstnames":["Yupeng"],"suffixes":[]},{"propositions":[],"lastnames":["Min"],"firstnames":["Yingqian"],"suffixes":[]},{"propositions":[],"lastnames":["Zhang"],"firstnames":["Beichen"],"suffixes":[]},{"propositions":[],"lastnames":["Zhang"],"firstnames":["Junjie"],"suffixes":[]},{"propositions":[],"lastnames":["Dong"],"firstnames":["Zican"],"suffixes":[]},{"propositions":[],"lastnames":["Du"],"firstnames":["Yifan"],"suffixes":[]},{"propositions":[],"lastnames":["Yang"],"firstnames":["Chen"],"suffixes":[]},{"propositions":[],"lastnames":["Chen"],"firstnames":["Yushuo"],"suffixes":[]},{"propositions":[],"lastnames":["Chen"],"firstnames":["Zhipeng"],"suffixes":[]},{"propositions":[],"lastnames":["Jiang"],"firstnames":["Jinhao"],"suffixes":[]},{"propositions":[],"lastnames":["Ren"],"firstnames":["Ruiyang"],"suffixes":[]},{"propositions":[],"lastnames":["Li"],"firstnames":["Yifan"],"suffixes":[]},{"propositions":[],"lastnames":["Tang"],"firstnames":["Xinyu"],"suffixes":[]},{"propositions":[],"lastnames":["Liu"],"firstnames":["Zikang"],"suffixes":[]},{"propositions":[],"lastnames":["Liu"],"firstnames":["Peiyu"],"suffixes":[]},{"propositions":[],"lastnames":["Nie"],"firstnames":["Jian-Yun"],"suffixes":[]},{"propositions":[],"lastnames":["Wen"],"firstnames":["Ji-Rong"],"suffixes":[]}],"month":"September","year":"2023","note":"arXiv:2303.18223 [cs]","keywords":"Computer Science - Artificial Intelligence, Computer Science - Computation and Language","bibtex":"@misc{zhao_survey_2023,\n\ttitle = {A {Survey} of {Large} {Language} {Models}},\n\turl = {http://arxiv.org/abs/2303.18223},\n\tabstract = {Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering of language intelligence by machine. Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable artificial intelligence (AI) algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pretraining Transformer models over large-scale corpora, showing strong capabilities in solving various natural language processing (NLP) tasks. Since the researchers have found that model scaling can lead to an improved model capacity, they further investigate the scaling effect by increasing the parameter scale to an even larger size. Interestingly, when the parameter scale exceeds a certain level, these enlarged language models not only achieve a significant performance improvement, but also exhibit some special abilities (e.g., incontext learning) that are not present in small-scale language models (e.g., BERT). To discriminate the language models in different parameter scales, the research community has coined the term large language models (LLM) for the PLMs of significant size (e.g., containing tens or hundreds of billions of parameters). Recently, the research on LLMs has been largely advanced by both academia and industry, and a remarkable progress is the launch of ChatGPT (a powerful AI chatbot developed based on LLMs), which has attracted widespread attention from society. The technical evolution of LLMs has been making an important impact on the entire AI community, which would revolutionize the way how we develop and use AI algorithms. Considering this rapid technical progress, in this survey, we review the recent advances of LLMs by introducing the background, key findings, and mainstream techniques. In particular, we focus on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation. Furthermore, we also summarize the available resources for developing LLMs and discuss the remaining issues for future directions. This survey provides an up-to-date review of the literature on LLMs, which can be a useful resource for both researchers and engineers.},\n\tlanguage = {en},\n\turldate = {2023-09-14},\n\tpublisher = {arXiv},\n\tauthor = {Zhao, Wayne Xin and Zhou, Kun and Li, Junyi and Tang, Tianyi and Wang, Xiaolei and Hou, Yupeng and Min, Yingqian and Zhang, Beichen and Zhang, Junjie and Dong, Zican and Du, Yifan and Yang, Chen and Chen, Yushuo and Chen, Zhipeng and Jiang, Jinhao and Ren, Ruiyang and Li, Yifan and Tang, Xinyu and Liu, Zikang and Liu, Peiyu and Nie, Jian-Yun and Wen, Ji-Rong},\n\tmonth = sep,\n\tyear = {2023},\n\tnote = {arXiv:2303.18223 [cs]},\n\tkeywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language},\n}\n\n\n\n","author_short":["Zhao, W. X.","Zhou, K.","Li, J.","Tang, T.","Wang, X.","Hou, Y.","Min, Y.","Zhang, B.","Zhang, J.","Dong, Z.","Du, Y.","Yang, C.","Chen, Y.","Chen, Z.","Jiang, J.","Ren, R.","Li, Y.","Tang, X.","Liu, Z.","Liu, P.","Nie, J.","Wen, J."],"key":"zhao_survey_2023","id":"zhao_survey_2023","bibbaseid":"zhao-zhou-li-tang-wang-hou-min-zhang-etal-asurveyoflargelanguagemodels-2023","role":"author","urls":{"Paper":"http://arxiv.org/abs/2303.18223"},"keyword":["Computer Science - Artificial Intelligence","Computer Science - Computation and Language"],"metadata":{"authorlinks":{}}},"bibtype":"misc","biburl":"https://bibbase.org/zotero-group/schulzkx/5158478","dataSources":["EQkpjiJorzbSFGLvW","6yXn8CtuzyEbCSr2m","jurZeGzSpYdkQ8rm4","JFDnASMkoQCjjGL8E"],"keywords":["computer science - artificial intelligence","computer science - computation and language"],"search_terms":["survey","large","language","models","zhao","zhou","li","tang","wang","hou","min","zhang","zhang","dong","du","yang","chen","chen","jiang","ren","li","tang","liu","liu","nie","wen"],"title":"A Survey of Large Language Models","year":2023}