Pretraining on the Test Set Is All You Need. Schaeffer, R. September, 2023. arXiv:2309.08632 [cs]
Pretraining on the Test Set Is All You Need [link]Paper  doi  abstract   bibtex   
Inspired by recent work demonstrating the promise of smaller Transformer-based language models pretrained on carefully curated data, we supercharge such approaches by investing heavily in curating a novel, high quality, non-synthetic data mixture based solely on evaluation benchmarks. Using our novel dataset mixture consisting of less than 100 thousand tokens, we pretrain a 1 million parameter transformer-based LLM \textbf\phi-CTNL\ (pronounced ``fictional") that achieves perfect results across diverse academic benchmarks, strictly outperforming all known foundation models. \textbf\phi-CTNL\ also beats power-law scaling and exhibits a never-before-seen grokking-like ability to accurately predict downstream evaluation benchmarks' canaries.
@misc{schaeffer_pretraining_2023,
	title = {Pretraining on the {Test} {Set} {Is} {All} {You} {Need}},
	url = {http://arxiv.org/abs/2309.08632},
	doi = {10.48550/arXiv.2309.08632},
	abstract = {Inspired by recent work demonstrating the promise of smaller Transformer-based language models pretrained on carefully curated data, we supercharge such approaches by investing heavily in curating a novel, high quality, non-synthetic data mixture based solely on evaluation benchmarks. Using our novel dataset mixture consisting of less than 100 thousand tokens, we pretrain a 1 million parameter transformer-based LLM {\textbackslash}textbf\{phi-CTNL\} (pronounced ``fictional") that achieves perfect results across diverse academic benchmarks, strictly outperforming all known foundation models. {\textbackslash}textbf\{phi-CTNL\} also beats power-law scaling and exhibits a never-before-seen grokking-like ability to accurately predict downstream evaluation benchmarks' canaries.},
	urldate = {2023-09-25},
	publisher = {arXiv},
	author = {Schaeffer, Rylan},
	month = sep,
	year = {2023},
	note = {arXiv:2309.08632 [cs]},
	keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language},
}

Downloads: 0