Llemma: An Open Language Model For Mathematics

Llemma: An Open Language Model For Mathematics. Azerbayev, Z., Schoelkopf, H., Paster, K., Santos, M. D., McAleer, S., Jiang, A. Q., Deng, J., Biderman, S., & Welleck, S. November, 2023. arXiv:2310.10631 [cs]

Paper doi abstract bibtex

We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.

@misc{azerbayev_llemma_2023,
	title = {Llemma: {An} {Open} {Language} {Model} {For} {Mathematics}},
	shorttitle = {Llemma},
	url = {http://arxiv.org/abs/2310.10631},
	doi = {10.48550/arXiv.2310.10631},
	abstract = {We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.},
	urldate = {2024-01-16},
	publisher = {arXiv},
	author = {Azerbayev, Zhangir and Schoelkopf, Hailey and Paster, Keiran and Santos, Marco Dos and McAleer, Stephen and Jiang, Albert Q. and Deng, Jia and Biderman, Stella and Welleck, Sean},
	month = nov,
	year = {2023},
	note = {arXiv:2310.10631 [cs]},
	keywords = {artificial intelligence, computation and language, large language models, mentions sympy},
}

Downloads: 0

{"_id":"MBB7QAEXSQuoni3nS","bibbaseid":"azerbayev-schoelkopf-paster-santos-mcaleer-jiang-deng-biderman-etal-llemmaanopenlanguagemodelformathematics-2023","author_short":["Azerbayev, Z.","Schoelkopf, H.","Paster, K.","Santos, M. D.","McAleer, S.","Jiang, A. Q.","Deng, J.","Biderman, S.","Welleck, S."],"bibdata":{"bibtype":"misc","type":"misc","title":"Llemma: An Open Language Model For Mathematics","shorttitle":"Llemma","url":"http://arxiv.org/abs/2310.10631","doi":"10.48550/arXiv.2310.10631","abstract":"We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.","urldate":"2024-01-16","publisher":"arXiv","author":[{"propositions":[],"lastnames":["Azerbayev"],"firstnames":["Zhangir"],"suffixes":[]},{"propositions":[],"lastnames":["Schoelkopf"],"firstnames":["Hailey"],"suffixes":[]},{"propositions":[],"lastnames":["Paster"],"firstnames":["Keiran"],"suffixes":[]},{"propositions":[],"lastnames":["Santos"],"firstnames":["Marco","Dos"],"suffixes":[]},{"propositions":[],"lastnames":["McAleer"],"firstnames":["Stephen"],"suffixes":[]},{"propositions":[],"lastnames":["Jiang"],"firstnames":["Albert","Q."],"suffixes":[]},{"propositions":[],"lastnames":["Deng"],"firstnames":["Jia"],"suffixes":[]},{"propositions":[],"lastnames":["Biderman"],"firstnames":["Stella"],"suffixes":[]},{"propositions":[],"lastnames":["Welleck"],"firstnames":["Sean"],"suffixes":[]}],"month":"November","year":"2023","note":"arXiv:2310.10631 [cs]","keywords":"artificial intelligence, computation and language, large language models, mentions sympy","bibtex":"@misc{azerbayev_llemma_2023,\n\ttitle = {Llemma: {An} {Open} {Language} {Model} {For} {Mathematics}},\n\tshorttitle = {Llemma},\n\turl = {http://arxiv.org/abs/2310.10631},\n\tdoi = {10.48550/arXiv.2310.10631},\n\tabstract = {We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.},\n\turldate = {2024-01-16},\n\tpublisher = {arXiv},\n\tauthor = {Azerbayev, Zhangir and Schoelkopf, Hailey and Paster, Keiran and Santos, Marco Dos and McAleer, Stephen and Jiang, Albert Q. and Deng, Jia and Biderman, Stella and Welleck, Sean},\n\tmonth = nov,\n\tyear = {2023},\n\tnote = {arXiv:2310.10631 [cs]},\n\tkeywords = {artificial intelligence, computation and language, large language models, mentions sympy},\n}\n\n\n\n\n\n\n\n\n\n\n\n","author_short":["Azerbayev, Z.","Schoelkopf, H.","Paster, K.","Santos, M. D.","McAleer, S.","Jiang, A. Q.","Deng, J.","Biderman, S.","Welleck, S."],"key":"azerbayev_llemma_2023","id":"azerbayev_llemma_2023","bibbaseid":"azerbayev-schoelkopf-paster-santos-mcaleer-jiang-deng-biderman-etal-llemmaanopenlanguagemodelformathematics-2023","role":"author","urls":{"Paper":"http://arxiv.org/abs/2310.10631"},"keyword":["artificial intelligence","computation and language","large language models","mentions sympy"],"metadata":{"authorlinks":{}}},"bibtype":"misc","biburl":"https://bibbase.org/zotero-group/nicoguaro/525293","dataSources":["YtBDXPDiQEyhyEDZC"],"keywords":["artificial intelligence","computation and language","large language models","mentions sympy"],"search_terms":["llemma","open","language","model","mathematics","azerbayev","schoelkopf","paster","santos","mcaleer","jiang","deng","biderman","welleck"],"title":"Llemma: An Open Language Model For Mathematics","year":2023}