LLaMA: Open and Efficient Foundation Language Models

LLaMA: Open and Efficient Foundation Language Models. Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. February, 2023. arXiv:2302.13971 [cs]

Paper doi abstract bibtex

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

@misc{touvron_llama_2023,
	title = {{LLaMA}: {Open} and {Efficient} {Foundation} {Language} {Models}},
	shorttitle = {{LLaMA}},
	url = {http://arxiv.org/abs/2302.13971},
	doi = {10.48550/arXiv.2302.13971},
	abstract = {We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.},
	urldate = {2023-07-17},
	publisher = {arXiv},
	author = {Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timothée and Rozière, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume},
	month = feb,
	year = {2023},
	note = {arXiv:2302.13971 [cs]},
	keywords = {\#broken, Computer Science - Computation and Language, Jab/\#Pre},
}

Downloads: 0

{"_id":"gzMStWnW4kMiTdK82","bibbaseid":"touvron-lavril-izacard-martinet-lachaux-lacroix-rozire-goyal-etal-llamaopenandefficientfoundationlanguagemodels-2023","author_short":["Touvron, H.","Lavril, T.","Izacard, G.","Martinet, X.","Lachaux, M.","Lacroix, T.","Rozière, B.","Goyal, N.","Hambro, E.","Azhar, F.","Rodriguez, A.","Joulin, A.","Grave, E.","Lample, G."],"bibdata":{"bibtype":"misc","type":"misc","title":"LLaMA: Open and Efficient Foundation Language Models","shorttitle":"LLaMA","url":"http://arxiv.org/abs/2302.13971","doi":"10.48550/arXiv.2302.13971","abstract":"We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.","urldate":"2023-07-17","publisher":"arXiv","author":[{"propositions":[],"lastnames":["Touvron"],"firstnames":["Hugo"],"suffixes":[]},{"propositions":[],"lastnames":["Lavril"],"firstnames":["Thibaut"],"suffixes":[]},{"propositions":[],"lastnames":["Izacard"],"firstnames":["Gautier"],"suffixes":[]},{"propositions":[],"lastnames":["Martinet"],"firstnames":["Xavier"],"suffixes":[]},{"propositions":[],"lastnames":["Lachaux"],"firstnames":["Marie-Anne"],"suffixes":[]},{"propositions":[],"lastnames":["Lacroix"],"firstnames":["Timothée"],"suffixes":[]},{"propositions":[],"lastnames":["Rozière"],"firstnames":["Baptiste"],"suffixes":[]},{"propositions":[],"lastnames":["Goyal"],"firstnames":["Naman"],"suffixes":[]},{"propositions":[],"lastnames":["Hambro"],"firstnames":["Eric"],"suffixes":[]},{"propositions":[],"lastnames":["Azhar"],"firstnames":["Faisal"],"suffixes":[]},{"propositions":[],"lastnames":["Rodriguez"],"firstnames":["Aurelien"],"suffixes":[]},{"propositions":[],"lastnames":["Joulin"],"firstnames":["Armand"],"suffixes":[]},{"propositions":[],"lastnames":["Grave"],"firstnames":["Edouard"],"suffixes":[]},{"propositions":[],"lastnames":["Lample"],"firstnames":["Guillaume"],"suffixes":[]}],"month":"February","year":"2023","note":"arXiv:2302.13971 [cs]","keywords":"#broken, Computer Science - Computation and Language, Jab/#Pre","bibtex":"@misc{touvron_llama_2023,\n\ttitle = {{LLaMA}: {Open} and {Efficient} {Foundation} {Language} {Models}},\n\tshorttitle = {{LLaMA}},\n\turl = {http://arxiv.org/abs/2302.13971},\n\tdoi = {10.48550/arXiv.2302.13971},\n\tabstract = {We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.},\n\turldate = {2023-07-17},\n\tpublisher = {arXiv},\n\tauthor = {Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timothée and Rozière, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume},\n\tmonth = feb,\n\tyear = {2023},\n\tnote = {arXiv:2302.13971 [cs]},\n\tkeywords = {\\#broken, Computer Science - Computation and Language, Jab/\\#Pre},\n}\n\n\n\n","author_short":["Touvron, H.","Lavril, T.","Izacard, G.","Martinet, X.","Lachaux, M.","Lacroix, T.","Rozière, B.","Goyal, N.","Hambro, E.","Azhar, F.","Rodriguez, A.","Joulin, A.","Grave, E.","Lample, G."],"key":"touvron_llama_2023","id":"touvron_llama_2023","bibbaseid":"touvron-lavril-izacard-martinet-lachaux-lacroix-rozire-goyal-etal-llamaopenandefficientfoundationlanguagemodels-2023","role":"author","urls":{"Paper":"http://arxiv.org/abs/2302.13971"},"keyword":["#broken","Computer Science - Computation and Language","Jab/#Pre"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"misc","biburl":"https://bibbase.org/zotero/bxt101","dataSources":["Wsv2bQ4jPuc7qme8R","nxjWwW7fWbb5tfpKz"],"keywords":["#broken","computer science - computation and language","jab/#pre"],"search_terms":["llama","open","efficient","foundation","language","models","touvron","lavril","izacard","martinet","lachaux","lacroix","rozière","goyal","hambro","azhar","rodriguez","joulin","grave","lample"],"title":"LLaMA: Open and Efficient Foundation Language Models","year":2023}