The fineweb datasets: Decanting the web for the finest text data at scale. Penedo, G., Kydlíček, H., Lozhkov, A., Mitchell, M., Raffel, C. A, Von Werra, L., & Wolf, T. In Advances in Neural Information Processing Systems, volume 37, pages 30811–30849, 2024.
The fineweb datasets: Decanting the web for the finest text data at scale [pdf]Paper  doi  bibtex   

Downloads: 0