Data-oriented methods for grapheme-to-phoneme conversion. van den Bosch, A. & Daelemans, W. In Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics, of EACL '93, pages 45–53, USA, April, 1993. Association for Computational Linguistics.
Data-oriented methods for grapheme-to-phoneme conversion [link]Paper  doi  abstract   bibtex   
It is traditionally assumed that various sources of linguistic knowledge and their interaction should be formalised in order to be able to convert words into their phonemic representations with reasonable accuracy. We show that using supervised learning techniques, based on a corpus of transcribed words, the same and even better performance can be achieved, without explicit modeling of linguistic knowledge.In this paper we present two instances of this approach. A first model implements a variant of instance-based learning, in which a weighed similarity metric and a database of prototypical exemplars are used to predict new mappings. In the second model, grapheme-to-phoneme mappings are looked up in a compressed text-to-speech lexicon (table lookup) enriched with default mappings. We compare performance and accuracy of these approaches to a connectionist (backpropagation) approach and to the linguistic knowledge-based approach.
@inproceedings{van_den_bosch_data-oriented_1993,
	address = {USA},
	series = {{EACL} '93},
	title = {Data-oriented methods for grapheme-to-phoneme conversion},
	isbn = {978-90-5434-014-0},
	url = {https://doi.org/10.3115/976744.976751},
	doi = {10.3115/976744.976751},
	abstract = {It is traditionally assumed that various sources of linguistic knowledge and their interaction should be formalised in order to be able to convert words into their phonemic representations with reasonable accuracy. We show that using supervised learning techniques, based on a corpus of transcribed words, the same and even better performance can be achieved, without explicit modeling of linguistic knowledge.In this paper we present two instances of this approach. A first model implements a variant of instance-based learning, in which a weighed similarity metric and a database of prototypical exemplars are used to predict new mappings. In the second model, grapheme-to-phoneme mappings are looked up in a compressed text-to-speech lexicon (table lookup) enriched with default mappings. We compare performance and accuracy of these approaches to a connectionist (backpropagation) approach and to the linguistic knowledge-based approach.},
	urldate = {2022-10-08},
	booktitle = {Proceedings of the sixth conference on {European} chapter of the {Association} for {Computational} {Linguistics}},
	publisher = {Association for Computational Linguistics},
	author = {van den Bosch, Antal and Daelemans, Walter},
	month = apr,
	year = {1993},
	pages = {45--53},
}

Downloads: 0