Pretraining on the Test Set Is All You Need

Pretraining on the Test Set Is All You Need. Schaeffer, R. September, 2023. arXiv:2309.08632 [cs]

Inspired by recent work demonstrating the promise of smaller Transformer-based language models pretrained on carefully curated data, we supercharge such approaches by investing heavily in curating a novel, high quality, non-synthetic data mixture based solely on evaluation benchmarks. Using our novel dataset mixture consisting of less than 100 thousand tokens, we pretrain a 1 million parameter transformer-based LLM \textbf\phi-CTNL\ (pronounced ``fictional") that achieves perfect results across diverse academic benchmarks, strictly outperforming all known foundation models. \textbf\phi-CTNL\ also beats power-law scaling and exhibits a never-before-seen grokking-like ability to accurately predict downstream evaluation benchmarks' canaries.

@misc{schaeffer_pretraining_2023,
	title = {Pretraining on the {Test} {Set} {Is} {All} {You} {Need}},
	url = {http://arxiv.org/abs/2309.08632},
	doi = {10.48550/arXiv.2309.08632},
	abstract = {Inspired by recent work demonstrating the promise of smaller Transformer-based language models pretrained on carefully curated data, we supercharge such approaches by investing heavily in curating a novel, high quality, non-synthetic data mixture based solely on evaluation benchmarks. Using our novel dataset mixture consisting of less than 100 thousand tokens, we pretrain a 1 million parameter transformer-based LLM {\textbackslash}textbf\{phi-CTNL\} (pronounced ``fictional") that achieves perfect results across diverse academic benchmarks, strictly outperforming all known foundation models. {\textbackslash}textbf\{phi-CTNL\} also beats power-law scaling and exhibits a never-before-seen grokking-like ability to accurately predict downstream evaluation benchmarks' canaries.},
	urldate = {2023-09-25},
	publisher = {arXiv},
	author = {Schaeffer, Rylan},
	month = sep,
	year = {2023},
	note = {arXiv:2309.08632 [cs]},
	keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language},
}

Downloads: 0

{"_id":"dTNfGNFBRpmfnFzvd","bibbaseid":"schaeffer-pretrainingonthetestsetisallyouneed-2023","author_short":["Schaeffer, R."],"bibdata":{"bibtype":"misc","type":"misc","title":"Pretraining on the Test Set Is All You Need","url":"http://arxiv.org/abs/2309.08632","doi":"10.48550/arXiv.2309.08632","abstract":"Inspired by recent work demonstrating the promise of smaller Transformer-based language models pretrained on carefully curated data, we supercharge such approaches by investing heavily in curating a novel, high quality, non-synthetic data mixture based solely on evaluation benchmarks. Using our novel dataset mixture consisting of less than 100 thousand tokens, we pretrain a 1 million parameter transformer-based LLM \\textbf\\phi-CTNL\\ (pronounced ``fictional\") that achieves perfect results across diverse academic benchmarks, strictly outperforming all known foundation models. \\textbf\\phi-CTNL\\ also beats power-law scaling and exhibits a never-before-seen grokking-like ability to accurately predict downstream evaluation benchmarks' canaries.","urldate":"2023-09-25","publisher":"arXiv","author":[{"propositions":[],"lastnames":["Schaeffer"],"firstnames":["Rylan"],"suffixes":[]}],"month":"September","year":"2023","note":"arXiv:2309.08632 [cs]","keywords":"Computer Science - Artificial Intelligence, Computer Science - Computation and Language","bibtex":"@misc{schaeffer_pretraining_2023,\n\ttitle = {Pretraining on the {Test} {Set} {Is} {All} {You} {Need}},\n\turl = {http://arxiv.org/abs/2309.08632},\n\tdoi = {10.48550/arXiv.2309.08632},\n\tabstract = {Inspired by recent work demonstrating the promise of smaller Transformer-based language models pretrained on carefully curated data, we supercharge such approaches by investing heavily in curating a novel, high quality, non-synthetic data mixture based solely on evaluation benchmarks. Using our novel dataset mixture consisting of less than 100 thousand tokens, we pretrain a 1 million parameter transformer-based LLM {\\textbackslash}textbf\\{phi-CTNL\\} (pronounced ``fictional\") that achieves perfect results across diverse academic benchmarks, strictly outperforming all known foundation models. {\\textbackslash}textbf\\{phi-CTNL\\} also beats power-law scaling and exhibits a never-before-seen grokking-like ability to accurately predict downstream evaluation benchmarks' canaries.},\n\turldate = {2023-09-25},\n\tpublisher = {arXiv},\n\tauthor = {Schaeffer, Rylan},\n\tmonth = sep,\n\tyear = {2023},\n\tnote = {arXiv:2309.08632 [cs]},\n\tkeywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language},\n}\n\n\n\n\n\n\n\n\n\n\n\n","author_short":["Schaeffer, R."],"key":"schaeffer_pretraining_2023","id":"schaeffer_pretraining_2023","bibbaseid":"schaeffer-pretrainingonthetestsetisallyouneed-2023","role":"author","urls":{"Paper":"http://arxiv.org/abs/2309.08632"},"keyword":["Computer Science - Artificial Intelligence","Computer Science - Computation and Language"],"metadata":{"authorlinks":{}}},"bibtype":"misc","biburl":"https://bibbase.org/zotero/abhishek-p","dataSources":["h7kKWXpJh2iaX92T5"],"keywords":["computer science - artificial intelligence","computer science - computation and language"],"search_terms":["pretraining","test","set","need","schaeffer"],"title":"Pretraining on the Test Set Is All You Need","year":2023}