Chain of thought monitorability: A new and fragile opportunity for ai safety. Korbak, T., Balesni, M., Barnes, E., Bengio, Y., Benton, J., Bloom, J., Chen, M., Cooney, A., Dafoe, A., & Dragan, A. 2025.
Chain of thought monitorability: A new and fragile opportunity for ai safety [link]Paper  bibtex   
@misc{korbak_chain_2025,
	title = {Chain of thought monitorability: {A} new and fragile opportunity for ai safety},
	url = {https://arxiv.org/pdf/2507.11473},
	publisher = {arXiv},
	author = {Korbak, Tomek and Balesni, Mikita and Barnes, Elizabeth and Bengio, Yoshua and Benton, Joe and Bloom, Joseph and Chen, Mark and Cooney, Alan and Dafoe, Allan and Dragan, Anca},
	year = {2025},
}

Downloads: 0