BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer. Asai, A., Kudugunta, S., Yu, X. V., Blevins, T., Gonen, H., Reid, M., Tsvetkov, Y., Ruder, S., & Hajishirzi, H. May, 2023. arXiv:2305.14857 [cs]
Paper doi abstract bibtex Despite remarkable advancements in few-shot generalization in natural language processing, most models are developed and evaluated primarily in English. To facilitate research on few-shot cross-lingual transfer, we introduce a new benchmark, called BUFFET, which unifies 15 diverse tasks across 54 languages in a sequence-to-sequence format and provides a fixed set of few-shot examples and instructions. BUFFET is designed to establish a rigorous and equitable evaluation framework for few-shot cross-lingual transfer across a broad range of tasks and languages. Using BUFFET, we perform thorough evaluations of state-of-the-art multilingual large language models with different transfer methods, namely in-context learning and fine-tuning. Our findings reveal significant room for improvement in few-shot in-context cross-lingual transfer. In particular, ChatGPT with in-context learning often performs worse than much smaller mT5-base models fine-tuned on English task data and few-shot in-language examples. Our analysis suggests various avenues for future research in few-shot cross-lingual transfer, such as improved pretraining, understanding, and future evaluations.
@misc{asai_buffet_2023,
title = {{BUFFET}: {Benchmarking} {Large} {Language} {Models} for {Few}-shot {Cross}-lingual {Transfer}},
shorttitle = {{BUFFET}},
url = {http://arxiv.org/abs/2305.14857},
doi = {10.48550/arXiv.2305.14857},
abstract = {Despite remarkable advancements in few-shot generalization in natural language processing, most models are developed and evaluated primarily in English. To facilitate research on few-shot cross-lingual transfer, we introduce a new benchmark, called BUFFET, which unifies 15 diverse tasks across 54 languages in a sequence-to-sequence format and provides a fixed set of few-shot examples and instructions. BUFFET is designed to establish a rigorous and equitable evaluation framework for few-shot cross-lingual transfer across a broad range of tasks and languages. Using BUFFET, we perform thorough evaluations of state-of-the-art multilingual large language models with different transfer methods, namely in-context learning and fine-tuning. Our findings reveal significant room for improvement in few-shot in-context cross-lingual transfer. In particular, ChatGPT with in-context learning often performs worse than much smaller mT5-base models fine-tuned on English task data and few-shot in-language examples. Our analysis suggests various avenues for future research in few-shot cross-lingual transfer, such as improved pretraining, understanding, and future evaluations.},
urldate = {2024-05-14},
publisher = {arXiv},
author = {Asai, Akari and Kudugunta, Sneha and Yu, Xinyan Velocity and Blevins, Terra and Gonen, Hila and Reid, Machel and Tsvetkov, Yulia and Ruder, Sebastian and Hajishirzi, Hannaneh},
month = may,
year = {2023},
note = {arXiv:2305.14857 [cs]},
keywords = {Computer Science - Computation and Language},
}
Downloads: 0
{"_id":"H9TA8HTrRWB2MRTyL","bibbaseid":"asai-kudugunta-yu-blevins-gonen-reid-tsvetkov-ruder-etal-buffetbenchmarkinglargelanguagemodelsforfewshotcrosslingualtransfer-2023","author_short":["Asai, A.","Kudugunta, S.","Yu, X. V.","Blevins, T.","Gonen, H.","Reid, M.","Tsvetkov, Y.","Ruder, S.","Hajishirzi, H."],"bibdata":{"bibtype":"misc","type":"misc","title":"BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer","shorttitle":"BUFFET","url":"http://arxiv.org/abs/2305.14857","doi":"10.48550/arXiv.2305.14857","abstract":"Despite remarkable advancements in few-shot generalization in natural language processing, most models are developed and evaluated primarily in English. To facilitate research on few-shot cross-lingual transfer, we introduce a new benchmark, called BUFFET, which unifies 15 diverse tasks across 54 languages in a sequence-to-sequence format and provides a fixed set of few-shot examples and instructions. BUFFET is designed to establish a rigorous and equitable evaluation framework for few-shot cross-lingual transfer across a broad range of tasks and languages. Using BUFFET, we perform thorough evaluations of state-of-the-art multilingual large language models with different transfer methods, namely in-context learning and fine-tuning. Our findings reveal significant room for improvement in few-shot in-context cross-lingual transfer. In particular, ChatGPT with in-context learning often performs worse than much smaller mT5-base models fine-tuned on English task data and few-shot in-language examples. Our analysis suggests various avenues for future research in few-shot cross-lingual transfer, such as improved pretraining, understanding, and future evaluations.","urldate":"2024-05-14","publisher":"arXiv","author":[{"propositions":[],"lastnames":["Asai"],"firstnames":["Akari"],"suffixes":[]},{"propositions":[],"lastnames":["Kudugunta"],"firstnames":["Sneha"],"suffixes":[]},{"propositions":[],"lastnames":["Yu"],"firstnames":["Xinyan","Velocity"],"suffixes":[]},{"propositions":[],"lastnames":["Blevins"],"firstnames":["Terra"],"suffixes":[]},{"propositions":[],"lastnames":["Gonen"],"firstnames":["Hila"],"suffixes":[]},{"propositions":[],"lastnames":["Reid"],"firstnames":["Machel"],"suffixes":[]},{"propositions":[],"lastnames":["Tsvetkov"],"firstnames":["Yulia"],"suffixes":[]},{"propositions":[],"lastnames":["Ruder"],"firstnames":["Sebastian"],"suffixes":[]},{"propositions":[],"lastnames":["Hajishirzi"],"firstnames":["Hannaneh"],"suffixes":[]}],"month":"May","year":"2023","note":"arXiv:2305.14857 [cs]","keywords":"Computer Science - Computation and Language","bibtex":"@misc{asai_buffet_2023,\n\ttitle = {{BUFFET}: {Benchmarking} {Large} {Language} {Models} for {Few}-shot {Cross}-lingual {Transfer}},\n\tshorttitle = {{BUFFET}},\n\turl = {http://arxiv.org/abs/2305.14857},\n\tdoi = {10.48550/arXiv.2305.14857},\n\tabstract = {Despite remarkable advancements in few-shot generalization in natural language processing, most models are developed and evaluated primarily in English. To facilitate research on few-shot cross-lingual transfer, we introduce a new benchmark, called BUFFET, which unifies 15 diverse tasks across 54 languages in a sequence-to-sequence format and provides a fixed set of few-shot examples and instructions. BUFFET is designed to establish a rigorous and equitable evaluation framework for few-shot cross-lingual transfer across a broad range of tasks and languages. Using BUFFET, we perform thorough evaluations of state-of-the-art multilingual large language models with different transfer methods, namely in-context learning and fine-tuning. Our findings reveal significant room for improvement in few-shot in-context cross-lingual transfer. In particular, ChatGPT with in-context learning often performs worse than much smaller mT5-base models fine-tuned on English task data and few-shot in-language examples. Our analysis suggests various avenues for future research in few-shot cross-lingual transfer, such as improved pretraining, understanding, and future evaluations.},\n\turldate = {2024-05-14},\n\tpublisher = {arXiv},\n\tauthor = {Asai, Akari and Kudugunta, Sneha and Yu, Xinyan Velocity and Blevins, Terra and Gonen, Hila and Reid, Machel and Tsvetkov, Yulia and Ruder, Sebastian and Hajishirzi, Hannaneh},\n\tmonth = may,\n\tyear = {2023},\n\tnote = {arXiv:2305.14857 [cs]},\n\tkeywords = {Computer Science - Computation and Language},\n}\n\n\n\n\n\n\n\n\n\n\n\n","author_short":["Asai, A.","Kudugunta, S.","Yu, X. V.","Blevins, T.","Gonen, H.","Reid, M.","Tsvetkov, Y.","Ruder, S.","Hajishirzi, H."],"key":"asai_buffet_2023","id":"asai_buffet_2023","bibbaseid":"asai-kudugunta-yu-blevins-gonen-reid-tsvetkov-ruder-etal-buffetbenchmarkinglargelanguagemodelsforfewshotcrosslingualtransfer-2023","role":"author","urls":{"Paper":"http://arxiv.org/abs/2305.14857"},"keyword":["Computer Science - Computation and Language"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"misc","biburl":"https://bibbase.org/zotero/SimonOst","dataSources":["tFvQtGDPkqnJa6Fbq"],"keywords":["computer science - computation and language"],"search_terms":["buffet","benchmarking","large","language","models","few","shot","cross","lingual","transfer","asai","kudugunta","yu","blevins","gonen","reid","tsvetkov","ruder","hajishirzi"],"title":"BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer","year":2023}