Aiuruetê: A high-quality concatenative text-to-speech system for Brazilian Portuguese with demisyllabic analysis-based units and a hierarchical model of rhythm production. Barbosa, P. A.; Violaro, F.; Albano, E. C.; Simões, F. O.; Aquino, P.; Madureira, S.; and Françozo, E. Volume 5 , Budapest, Hungary, September 5-9, 1999.
Aiuruetê: A high-quality concatenative text-to-speech system for Brazilian Portuguese with demisyllabic analysis-based units and a hierarchical model of rhythm production [link]Paper  abstract   bibtex   
Aiuruetê is a high-quality concatenative TTS system for Brazilian Portuguese. Its name (pronounced [!"#\$%#&'(&]) illustrates the challenges we have fixed as a research paradigm: to feed the system with the specificities of our language, highlighted by an up-to-date discussion of the Phonology/Phonetics and prosody/segments interfaces, without a huge computational cost. The choice for the concatenative method of synthesis was determined by a trade-off between scientific (the desired human-like naturalness of the acoustic output) and practical (mainly reduced staff and tight schedule) constraints. Procedural and declarative modules are described here: the ortofon, the unit inventory, the rhythm model and the synthesis techniques. Aiuruetê is still being evaluated, but when compared to the previous system, adopted by the national telephony company, its superior quality is apparent. The grapheme-to-phone converter (henceforth Ortofon), the unit inventory and the rhythm model have inherited most of their high-quality features from discussions concerning the Phonetics-Phonology interface and the prosody-segments interaction, both in the light of a dynamical-system perspective (cf. Browman and Goldstein's Articulatory Phonology [7][8] and Port's Temporal Phonology [19]). Before setting the system into operation, it is possible to select one from two pitch-synchronous synthesis techniques: TD-PSOLA or the hybrid model. All modules and interfaces, mostly implemented in C++, form a classical concatenative TTS system layout which was installed in a PC with a friendly DELPHI interface for the user [21]. All modules were independently developed by different research teams.
@book{barbosa_aiuruete:_1999,
	Author = {Barbosa, Plínio Almeida and Violaro, Fábio and Albano, Eleonora Cavalcante and Simões, Flávio Olmos and Aquino, Patrícia and Madureira, Sandra and Françozo, Edson},
	Date = {1999},
	Date-Modified = {2016-09-23 19:24:00 +0000},
	Keywords = {phonetics, Portuguese, prosody, rhythm, speech synthesis, speech technology, temporal factors, text-to-speech, text-to-speech system},
	Publisher = {Budapest, Hungary, September 5-9, 1999},
	Title = {Aiuruetê: A high-quality concatenative text-to-speech system for Brazilian Portuguese with demisyllabic analysis-based units and a hierarchical model of rhythm production},
	Url = {http://www.isca-speech.org/archive/eurospeech_1999/e99_2059.html},
	Volume = {5},
	Abstract = {Aiuruetê is a high-quality concatenative TTS system for Brazilian Portuguese. Its name (pronounced [!"\#\$\%\#\&'(\&]) illustrates the challenges we have fixed as a research paradigm: to feed the system with the specificities of our language, highlighted by an up-to-date discussion of the Phonology/Phonetics and prosody/segments interfaces, without a huge computational cost. The choice for the concatenative method of synthesis was determined by a trade-off between scientific (the desired human-like naturalness of the acoustic output) and practical (mainly reduced staff and tight schedule) constraints. Procedural and declarative modules are described here: the ortofon, the unit inventory, the rhythm model and the synthesis techniques. Aiuruetê is still being evaluated, but when compared to the previous system, adopted by the national telephony company, its superior quality is apparent. The grapheme-to-phone converter (henceforth Ortofon), the unit inventory and the rhythm model have inherited most of their high-quality features from discussions concerning the Phonetics-Phonology interface and the prosody-segments interaction, both in the light of a dynamical-system perspective (cf. Browman and Goldstein's Articulatory Phonology [7][8] and Port's Temporal Phonology [19]). Before setting the system into operation, it is possible to select one from two pitch-synchronous synthesis techniques: TD-PSOLA or the hybrid model. All modules and interfaces, mostly implemented in C++, form a classical concatenative TTS system layout which was installed in a PC with a friendly DELPHI interface for the user [21]. All modules were independently developed by different research teams.},
	Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGJCVYJHZlcnNpb25YJG9iamVjdHNZJGFyY2hpdmVyVCR0b3ASAAGGoKgHCBMUFRYaIVUkbnVsbNMJCgsMDxJXTlMua2V5c1pOUy5vYmplY3RzViRjbGFzc6INDoACgAOiEBGABIAFgAdccmVsYXRpdmVQYXRoWWFsaWFzRGF0YW8QZQAuAC4ALwAuAC4ALwAuAC4ALwBCAGkAYgBsAGkAbwBnAHIAYQBmAGkAYQAvAFAAYQBwAGUAcgBzAC8AQgBhAHIAYgBvAHMAYQAvAEEAaQB1AHIAdQBlAHQAZQMCACAAQQAgAGgAaQBnAGgALQBxAHUAYQBsAGkAdAB5ACAAYwBvAG4AYwBhAHQAZQBuAGEAdABpAHYAZQAgAHQAZQB4AHQALQB0AG8ALQBzAHAAZQBlAGMAaAAgAHMAeQBzAHQAZQBtAC4AcABkAGbSFwsYGVdOUy5kYXRhTxECagAAAAACagACAAAMTWFjaW50b3NoIEhEAAAAAAAAAAAAAAAAAAAAy/YfzkgrAAAQhmb+H0FpdXJ1ZXSQIEEgaGlnaC1xdSMxMDg2NjcwMC5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABCGZwDT6kHwAAAAAAAAAAAAAwAEAAAJIAAAAAAAAAAAAAAAAAAAAAdCYXJib3NhAAAQAAgAAMv2A64AAAARAAgAANPqJdAAAAABABQQhmb+EIZljgAF/EcABfuYAADARgACAGVNYWNpbnRvc2ggSEQ6VXNlcnM6AGpvYXF1aW1fbGxpc3RlcnJpOgBCaWJsaW9ncmFmaWE6AFBhcGVyczoAQmFyYm9zYToAQWl1cnVldJAgQSBoaWdoLXF1IzEwODY2NzAwLnBkZgAADgCCAEAAQQBpAHUAcgB1AGUAdABlAwIAIABBACAAaABpAGcAaAAtAHEAdQBhAGwAaQB0AHkAIABjAG8AbgBjAGEAdABlAG4AYQB0AGkAdgBlACAAdABlAHgAdAAtAHQAbwAtAHMAcABlAGUAYwBoACAAcwB5AHMAdABlAG0ALgBwAGQAZgAPABoADABNAGEAYwBpAG4AdABvAHMAaAAgAEgARAASAHVVc2Vycy9qb2FxdWltX2xsaXN0ZXJyaS9CaWJsaW9ncmFmaWEvUGFwZXJzL0JhcmJvc2EvQWl1cnVldGXMgiBBIGhpZ2gtcXVhbGl0eSBjb25jYXRlbmF0aXZlIHRleHQtdG8tc3BlZWNoIHN5c3RlbS5wZGYAABMAAS8AABUAAgAY//8AAIAG0hscHR5aJGNsYXNzbmFtZVgkY2xhc3Nlc11OU011dGFibGVEYXRhox0fIFZOU0RhdGFYTlNPYmplY3TSGxwiI1xOU0RpY3Rpb25hcnmiIiBfEA9OU0tleWVkQXJjaGl2ZXLRJidUcm9vdIABAAgAEQAaACMALQAyADcAQABGAE0AVQBgAGcAagBsAG4AcQBzAHUAdwCEAI4BWwFgAWgD1gPYA90D6APxA/8EAwQKBBMEGAQlBCgEOgQ9BEIAAAAAAAACAQAAAAAAAAAoAAAAAAAAAAAAAAAAAAAERA==},
	Bdsk-Url-1 = {http://www.isca-speech.org/archive/eurospeech_1999/e99_2059.html}}
Downloads: 0