Machine Translation Between High-resource Languages in a Language Documentation Setting. Kann, K., Ebrahimi, A., Stenzel, K., & Palmer, A. In Proceedings of the first workshop on NLP applications to field linguistics, pages 26–33, Gyeongju, Republic of Korea, October, 2022. International Conference on Computational Linguistics.
Machine Translation Between High-resource Languages in a Language Documentation Setting [link]Paper  abstract   bibtex   
Language documentation encompasses translation, typically into the dominant high-resource language in the region where the target language is spoken. To make data accessible to a broader audience, additional translation into other high-resource languages might be needed. Working within a project documenting Kotiria, we explore the extent to which state-of-the-art machine translation (MT) systems can support this second translation – in our case from Portuguese to English. This translation task is challenging for multiple reasons: (1) the data is out-of-domain with respect to the MT system's training data, (2) much of the data is conversational, (3) existing translations include non-standard and uncommon expressions, often reflecting properties of the documented language, and (4) the data includes borrowings from other regional languages. Despite these challenges, existing MT systems perform at a usable level, though there is still room for improvement. We then conduct a qualitative analysis and suggest ways to improve MT between high-resource languages in a language documentation setting.
@inproceedings{kann_machine_2022,
	address = {Gyeongju, Republic of Korea},
	title = {Machine {Translation} {Between} {High}-resource {Languages} in a {Language} {Documentation} {Setting}},
	url = {https://aclanthology.org/2022.fieldmatters-1.3},
	abstract = {Language documentation encompasses translation, typically into the dominant high-resource language in the region where the target language is spoken. To make data accessible to a broader audience, additional translation into other high-resource languages might be needed. Working within a project documenting Kotiria, we explore the extent to which state-of-the-art machine translation (MT) systems can support this second translation – in our case from Portuguese to English. This translation task is challenging for multiple reasons: (1) the data is out-of-domain with respect to the MT system's training data, (2) much of the data is conversational, (3) existing translations include non-standard and uncommon expressions, often reflecting properties of the documented language, and (4) the data includes borrowings from other regional languages. Despite these challenges, existing MT systems perform at a usable level, though there is still room for improvement. We then conduct a qualitative analysis and suggest ways to improve MT between high-resource languages in a language documentation setting.},
	urldate = {2023-01-09},
	booktitle = {Proceedings of the first workshop on {NLP} applications to field linguistics},
	publisher = {International Conference on Computational Linguistics},
	author = {Kann, Katharina and Ebrahimi, Abteen and Stenzel, Kristine and Palmer, Alexis},
	month = oct,
	year = {2022},
	pages = {26--33},
}

Downloads: 0