Massively Multilingual Pronunciation Modeling with WikiPron. Lee, J. L., Ashby, L. F., Garza, M. E., Lee-Sikka, Y., Miller, S., Wong, A., McCarthy, A. D., & Gorman, K. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4223–4228, Marseille, France, May, 2020. European Language Resources Association.
Massively Multilingual Pronunciation Modeling with WikiPron [link]Paper  abstract   bibtex   
We introduce WikiPron, an open-source command-line tool for extracting pronunciation data from Wiktionary, a collaborative multilingual online dictionary. We first describe the design and use of WikiPron. We then discuss the challenges faced scaling this tool to create an automatically-generated database of 1.7 million pronunciations from 165 languages. Finally, we validate the pronunciation database by using it to train and evaluating a collection of generic grapheme-to-phoneme models. The software, pronunciation data, and models are all made available under permissive open-source licenses.
@inproceedings{lee_massively_2020,
	address = {Marseille, France},
	title = {Massively {Multilingual} {Pronunciation} {Modeling} with {WikiPron}},
	isbn = {979-10-95546-34-4},
	url = {https://aclanthology.org/2020.lrec-1.521},
	abstract = {We introduce WikiPron, an open-source command-line tool for extracting pronunciation data from Wiktionary, a collaborative multilingual online dictionary. We first describe the design and use of WikiPron. We then discuss the challenges faced scaling this tool to create an automatically-generated database of 1.7 million pronunciations from 165 languages. Finally, we validate the pronunciation database by using it to train and evaluating a collection of generic grapheme-to-phoneme models. The software, pronunciation data, and models are all made available under permissive open-source licenses.},
	language = {English},
	urldate = {2023-04-05},
	booktitle = {Proceedings of the {Twelfth} {Language} {Resources} and {Evaluation} {Conference}},
	publisher = {European Language Resources Association},
	author = {Lee, Jackson L. and Ashby, Lucas F.E. and Garza, M. Elizabeth and Lee-Sikka, Yeonju and Miller, Sean and Wong, Alan and McCarthy, Arya D. and Gorman, Kyle},
	month = may,
	year = {2020},
	pages = {4223--4228},
}

Downloads: 0