Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach. Geiping, J., McLeish, S., Jain, N., Kirchenbauer, J., Singh, S., Bartoldson, B. R., Kailkhura, B., Bhatele, A., & Goldstein, T. February, 2025. arXiv:2502.05171 [cs]
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach [link]Paper  doi  abstract   bibtex   
We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time. This stands in contrast to mainstream reasoning models that scale up compute by producing more tokens. Unlike approaches based on chain-of-thought, our approach does not require any specialized training data, can work with small context windows, and can capture types of reasoning that are not easily represented in words. We scale a proof-of-concept model to 3.5 billion parameters and 800 billion tokens. We show that the resulting model can improve its performance on reasoning benchmarks, sometimes dramatically, up to a computation load equivalent to 50 billion parameters.
@misc{geiping_scaling_2025,
	title = {Scaling up {Test}-{Time} {Compute} with {Latent} {Reasoning}: {A} {Recurrent} {Depth} {Approach}},
	shorttitle = {Scaling up {Test}-{Time} {Compute} with {Latent} {Reasoning}},
	url = {http://arxiv.org/abs/2502.05171},
	doi = {10.48550/arXiv.2502.05171},
	abstract = {We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time. This stands in contrast to mainstream reasoning models that scale up compute by producing more tokens. Unlike approaches based on chain-of-thought, our approach does not require any specialized training data, can work with small context windows, and can capture types of reasoning that are not easily represented in words. We scale a proof-of-concept model to 3.5 billion parameters and 800 billion tokens. We show that the resulting model can improve its performance on reasoning benchmarks, sometimes dramatically, up to a computation load equivalent to 50 billion parameters.},
	urldate = {2025-04-16},
	publisher = {arXiv},
	author = {Geiping, Jonas and McLeish, Sean and Jain, Neel and Kirchenbauer, John and Singh, Siddharth and Bartoldson, Brian R. and Kailkhura, Bhavya and Bhatele, Abhinav and Goldstein, Tom},
	month = feb,
	year = {2025},
	note = {arXiv:2502.05171 [cs]},
	keywords = {Computer Science - Computation and Language, Computer Science - Machine Learning},
}

Downloads: 0