{"_id":"93Rzzw2xBRPigRCPL","bibbaseid":"gheini-ren-may-crossattentionisallyouneedadaptingpretrainedtransformersformachinetranslation-2021","author_short":["Gheini, M.","Ren, X.","May, J."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","title":"Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation","author":[{"propositions":[],"lastnames":["Gheini"],"firstnames":["Mozhdeh"],"suffixes":[]},{"propositions":[],"lastnames":["Ren"],"firstnames":["Xiang"],"suffixes":[]},{"propositions":[],"lastnames":["May"],"firstnames":["Jonathan"],"suffixes":[]}],"booktitle":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","month":"November","year":"2021","address":"Online and Punta Cana, Dominican Republic","publisher":"Association for Computational Linguistics","url":"https://aclanthology.org/2021.emnlp-main.132","pages":"1754–1765","abstract":"We study the power of cross-attention in the Transformer architecture within the context of transfer learning for machine translation, and extend the findings of studies into cross-attention when training from scratch. We conduct a series of experiments through fine-tuning a translation model on data where either the source or target language has changed. These experiments reveal that fine-tuning only the cross-attention parameters is nearly as effective as fine-tuning all parameters (i.e., the entire translation model). We provide insights into why this is the case and observe that limiting fine-tuning in this manner yields cross-lingually aligned embeddings. The implications of this finding for researchers and practitioners include a mitigation of catastrophic forgetting, the potential for zero-shot translation, and the ability to extend machine translation models to several new language pairs with reduced parameter storage overhead.","bibtex":"@inproceedings{gheini-etal-2021-cross,\n title = \"Cross-Attention is All You Need: {A}dapting Pretrained {T}ransformers for Machine Translation\",\n author = \"Gheini, Mozhdeh and\n Ren, Xiang and\n May, Jonathan\",\n booktitle = \"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing\",\n month = nov,\n year = \"2021\",\n address = \"Online and Punta Cana, Dominican Republic\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://aclanthology.org/2021.emnlp-main.132\",\n pages = \"1754--1765\",\n abstract = \"We study the power of cross-attention in the Transformer architecture within the context of transfer learning for machine translation, and extend the findings of studies into cross-attention when training from scratch. We conduct a series of experiments through fine-tuning a translation model on data where either the source or target language has changed. These experiments reveal that fine-tuning only the cross-attention parameters is nearly as effective as fine-tuning all parameters (i.e., the entire translation model). We provide insights into why this is the case and observe that limiting fine-tuning in this manner yields cross-lingually aligned embeddings. The implications of this finding for researchers and practitioners include a mitigation of catastrophic forgetting, the potential for zero-shot translation, and the ability to extend machine translation models to several new language pairs with reduced parameter storage overhead.\",\n}\n\n\n","author_short":["Gheini, M.","Ren, X.","May, J."],"key":"gheini-etal-2021-cross","id":"gheini-etal-2021-cross","bibbaseid":"gheini-ren-may-crossattentionisallyouneedadaptingpretrainedtransformersformachinetranslation-2021","role":"author","urls":{"Paper":"https://aclanthology.org/2021.emnlp-main.132"},"metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"https://jonmay.github.io/webpage/cutelabname/cutelabname.bib","dataSources":["e4Q8Z3dm3WgDhb6Dj","hbZSwot2msWk92m5B","fcWjcoAgajPvXWcp7","GvHfaAWP6AfN6oLQE","j3Qzx9HAAC6WtJDHS","5eM3sAccSEpjSDHHQ"],"keywords":[],"search_terms":["cross","attention","need","adapting","pretrained","transformers","machine","translation","gheini","ren","may"],"title":"Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation","year":2021}