BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer. Asai, A., Kudugunta, S., Yu, X. V., Blevins, T., Gonen, H., Reid, M., Tsvetkov, Y., Ruder, S., & Hajishirzi, H. May, 2023. arXiv:2305.14857 [cs]
BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer [link]Paper  doi  abstract   bibtex   
Despite remarkable advancements in few-shot generalization in natural language processing, most models are developed and evaluated primarily in English. To facilitate research on few-shot cross-lingual transfer, we introduce a new benchmark, called BUFFET, which unifies 15 diverse tasks across 54 languages in a sequence-to-sequence format and provides a fixed set of few-shot examples and instructions. BUFFET is designed to establish a rigorous and equitable evaluation framework for few-shot cross-lingual transfer across a broad range of tasks and languages. Using BUFFET, we perform thorough evaluations of state-of-the-art multilingual large language models with different transfer methods, namely in-context learning and fine-tuning. Our findings reveal significant room for improvement in few-shot in-context cross-lingual transfer. In particular, ChatGPT with in-context learning often performs worse than much smaller mT5-base models fine-tuned on English task data and few-shot in-language examples. Our analysis suggests various avenues for future research in few-shot cross-lingual transfer, such as improved pretraining, understanding, and future evaluations.
@misc{asai_buffet_2023,
	title = {{BUFFET}: {Benchmarking} {Large} {Language} {Models} for {Few}-shot {Cross}-lingual {Transfer}},
	shorttitle = {{BUFFET}},
	url = {http://arxiv.org/abs/2305.14857},
	doi = {10.48550/arXiv.2305.14857},
	abstract = {Despite remarkable advancements in few-shot generalization in natural language processing, most models are developed and evaluated primarily in English. To facilitate research on few-shot cross-lingual transfer, we introduce a new benchmark, called BUFFET, which unifies 15 diverse tasks across 54 languages in a sequence-to-sequence format and provides a fixed set of few-shot examples and instructions. BUFFET is designed to establish a rigorous and equitable evaluation framework for few-shot cross-lingual transfer across a broad range of tasks and languages. Using BUFFET, we perform thorough evaluations of state-of-the-art multilingual large language models with different transfer methods, namely in-context learning and fine-tuning. Our findings reveal significant room for improvement in few-shot in-context cross-lingual transfer. In particular, ChatGPT with in-context learning often performs worse than much smaller mT5-base models fine-tuned on English task data and few-shot in-language examples. Our analysis suggests various avenues for future research in few-shot cross-lingual transfer, such as improved pretraining, understanding, and future evaluations.},
	urldate = {2024-05-14},
	publisher = {arXiv},
	author = {Asai, Akari and Kudugunta, Sneha and Yu, Xinyan Velocity and Blevins, Terra and Gonen, Hila and Reid, Machel and Tsvetkov, Yulia and Ruder, Sebastian and Hajishirzi, Hannaneh},
	month = may,
	year = {2023},
	note = {arXiv:2305.14857 [cs]},
	keywords = {Computer Science - Computation and Language},
}

Downloads: 0