Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach. Geiping, J., McLeish, S., Jain, N., Kirchenbauer, J., Singh, S., Bartoldson, B. R., Kailkhura, B., Bhatele, A., & Goldstein, T. February, 2025. arXiv:2502.05171 [cs]

Paper doi abstract bibtex

We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time. This stands in contrast to mainstream reasoning models that scale up compute by producing more tokens. Unlike approaches based on chain-of-thought, our approach does not require any specialized training data, can work with small context windows, and can capture types of reasoning that are not easily represented in words. We scale a proof-of-concept model to 3.5 billion parameters and 800 billion tokens. We show that the resulting model can improve its performance on reasoning benchmarks, sometimes dramatically, up to a computation load equivalent to 50 billion parameters.

@misc{geiping_scaling_2025,
	title = {Scaling up {Test}-{Time} {Compute} with {Latent} {Reasoning}: {A} {Recurrent} {Depth} {Approach}},
	shorttitle = {Scaling up {Test}-{Time} {Compute} with {Latent} {Reasoning}},
	url = {http://arxiv.org/abs/2502.05171},
	doi = {10.48550/arXiv.2502.05171},
	abstract = {We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time. This stands in contrast to mainstream reasoning models that scale up compute by producing more tokens. Unlike approaches based on chain-of-thought, our approach does not require any specialized training data, can work with small context windows, and can capture types of reasoning that are not easily represented in words. We scale a proof-of-concept model to 3.5 billion parameters and 800 billion tokens. We show that the resulting model can improve its performance on reasoning benchmarks, sometimes dramatically, up to a computation load equivalent to 50 billion parameters.},
	urldate = {2025-04-16},
	publisher = {arXiv},
	author = {Geiping, Jonas and McLeish, Sean and Jain, Neel and Kirchenbauer, John and Singh, Siddharth and Bartoldson, Brian R. and Kailkhura, Bhavya and Bhatele, Abhinav and Goldstein, Tom},
	month = feb,
	year = {2025},
	note = {arXiv:2502.05171 [cs]},
	keywords = {Computer Science - Computation and Language, Computer Science - Machine Learning},
}

Downloads: 0

{"_id":"NvDDwDr8zkJRdBmu4","bibbaseid":"geiping-mcleish-jain-kirchenbauer-singh-bartoldson-kailkhura-bhatele-etal-scalinguptesttimecomputewithlatentreasoningarecurrentdepthapproach-2025","author_short":["Geiping, J.","McLeish, S.","Jain, N.","Kirchenbauer, J.","Singh, S.","Bartoldson, B. R.","Kailkhura, B.","Bhatele, A.","Goldstein, T."],"bibdata":{"bibtype":"misc","type":"misc","title":"Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach","shorttitle":"Scaling up Test-Time Compute with Latent Reasoning","url":"http://arxiv.org/abs/2502.05171","doi":"10.48550/arXiv.2502.05171","abstract":"We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time. This stands in contrast to mainstream reasoning models that scale up compute by producing more tokens. Unlike approaches based on chain-of-thought, our approach does not require any specialized training data, can work with small context windows, and can capture types of reasoning that are not easily represented in words. We scale a proof-of-concept model to 3.5 billion parameters and 800 billion tokens. We show that the resulting model can improve its performance on reasoning benchmarks, sometimes dramatically, up to a computation load equivalent to 50 billion parameters.","urldate":"2025-04-16","publisher":"arXiv","author":[{"propositions":[],"lastnames":["Geiping"],"firstnames":["Jonas"],"suffixes":[]},{"propositions":[],"lastnames":["McLeish"],"firstnames":["Sean"],"suffixes":[]},{"propositions":[],"lastnames":["Jain"],"firstnames":["Neel"],"suffixes":[]},{"propositions":[],"lastnames":["Kirchenbauer"],"firstnames":["John"],"suffixes":[]},{"propositions":[],"lastnames":["Singh"],"firstnames":["Siddharth"],"suffixes":[]},{"propositions":[],"lastnames":["Bartoldson"],"firstnames":["Brian","R."],"suffixes":[]},{"propositions":[],"lastnames":["Kailkhura"],"firstnames":["Bhavya"],"suffixes":[]},{"propositions":[],"lastnames":["Bhatele"],"firstnames":["Abhinav"],"suffixes":[]},{"propositions":[],"lastnames":["Goldstein"],"firstnames":["Tom"],"suffixes":[]}],"month":"February","year":"2025","note":"arXiv:2502.05171 [cs]","keywords":"Computer Science - Computation and Language, Computer Science - Machine Learning","bibtex":"@misc{geiping_scaling_2025,\n\ttitle = {Scaling up {Test}-{Time} {Compute} with {Latent} {Reasoning}: {A} {Recurrent} {Depth} {Approach}},\n\tshorttitle = {Scaling up {Test}-{Time} {Compute} with {Latent} {Reasoning}},\n\turl = {http://arxiv.org/abs/2502.05171},\n\tdoi = {10.48550/arXiv.2502.05171},\n\tabstract = {We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time. This stands in contrast to mainstream reasoning models that scale up compute by producing more tokens. Unlike approaches based on chain-of-thought, our approach does not require any specialized training data, can work with small context windows, and can capture types of reasoning that are not easily represented in words. We scale a proof-of-concept model to 3.5 billion parameters and 800 billion tokens. We show that the resulting model can improve its performance on reasoning benchmarks, sometimes dramatically, up to a computation load equivalent to 50 billion parameters.},\n\turldate = {2025-04-16},\n\tpublisher = {arXiv},\n\tauthor = {Geiping, Jonas and McLeish, Sean and Jain, Neel and Kirchenbauer, John and Singh, Siddharth and Bartoldson, Brian R. and Kailkhura, Bhavya and Bhatele, Abhinav and Goldstein, Tom},\n\tmonth = feb,\n\tyear = {2025},\n\tnote = {arXiv:2502.05171 [cs]},\n\tkeywords = {Computer Science - Computation and Language, Computer Science - Machine Learning},\n}\n\n","author_short":["Geiping, J.","McLeish, S.","Jain, N.","Kirchenbauer, J.","Singh, S.","Bartoldson, B. R.","Kailkhura, B.","Bhatele, A.","Goldstein, T."],"key":"geiping_scaling_2025","id":"geiping_scaling_2025","bibbaseid":"geiping-mcleish-jain-kirchenbauer-singh-bartoldson-kailkhura-bhatele-etal-scalinguptesttimecomputewithlatentreasoningarecurrentdepthapproach-2025","role":"author","urls":{"Paper":"http://arxiv.org/abs/2502.05171"},"keyword":["Computer Science - Computation and Language","Computer Science - Machine Learning"],"metadata":{"authorlinks":{}}},"bibtype":"misc","biburl":"https://api.zotero.org/users/15655889/collections/G6GP9ANU/items?key=MzHVK1tHvHTcC946y3GIaoco&format=bibtex&limit=100","dataSources":["AX3fiSG9gtBuegYX9","MpmemwLeQzDcKDq6x","TSvsyzYFzoTiDesZP"],"keywords":["computer science - computation and language","computer science - machine learning"],"search_terms":["scaling","test","time","compute","latent","reasoning","recurrent","depth","approach","geiping","mcleish","jain","kirchenbauer","singh","bartoldson","kailkhura","bhatele","goldstein"],"title":"Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach","year":2025}