Development of Community-Oriented Text-to-Speech Models for Māori ‘Avaiki Nui (Cook Islands Māori)

Development of Community-Oriented Text-to-Speech Models for Māori ‘Avaiki Nui (Cook Islands Māori). James, J., Coto-Solano, R., Nicholas, S. A., Zhu, J., Yu, B., Babasaki, F., Wang, J. T., & Derby, N. In Calzolari, N., Kan, M., Hoste, V., Lenci, A., Sakti, S., & Xue, N., editors, Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 4820–4831, Torino, Italia, May, 2024. ELRA and ICCL.

Paper abstract bibtex

In this paper we describe the development of a text-to-speech system for Māori ‘Avaiki Nui (Cook Islands Māori). We provide details about the process of community-collaboration that was followed throughout the project, a continued engagement where we are trying to develop speech and language technology for the benefit of the community. During this process we gathered a group of recordings that we used to train a TTS system. When training we used two approaches, the HMM-system MaryTTS (Schröder et al., 2011) and the deep learning system FastSpeech2 (Ren et al., 2020). We performed two evaluation tasks on the models: First, we measured their quality by having the synthesized speech transcribed by ASR. The human produced ground truth had lower error rates (CER=4.3, WER=18), but the FastSpeech2 audio has lower error rates (CER=11.8 and WER=42.7) than the MaryTTS voice (CER=17.9 and WER=48.1). The second evaluation was a survey amongst speakers of the language so they could judge the voice`s quality. The ground truth was rated with the highest quality (MOS=4.6), but the FastSpeech2 voice had an overall quality of MOS=3.2, which was significantly higher than that of the MaryTTS synthesized recordings (MOS=2.0). We intend to use the FastSpeech2 model to create language learning tools for community members both on the Cook Islands and in the diaspora.

@inproceedings{james_development_2024,
	address = {Torino, Italia},
	title = {Development of {Community}-{Oriented} {Text}-to-{Speech} {Models} for {Māori} ‘{Avaiki} {Nui} ({Cook} {Islands} {Māori})},
	url = {https://aclanthology.org/2024.lrec-main.432/},
	abstract = {In this paper we describe the development of a text-to-speech system for Māori ‘Avaiki Nui (Cook Islands Māori). We provide details about the process of community-collaboration that was followed throughout the project, a continued engagement where we are trying to develop speech and language technology for the benefit of the community. During this process we gathered a group of recordings that we used to train a TTS system. When training we used two approaches, the HMM-system MaryTTS (Schröder et al., 2011) and the deep learning system FastSpeech2 (Ren et al., 2020). We performed two evaluation tasks on the models: First, we measured their quality by having the synthesized speech transcribed by ASR. The human produced ground truth had lower error rates (CER=4.3, WER=18), but the FastSpeech2 audio has lower error rates (CER=11.8 and WER=42.7) than the MaryTTS voice (CER=17.9 and WER=48.1). The second evaluation was a survey amongst speakers of the language so they could judge the voice`s quality. The ground truth was rated with the highest quality (MOS=4.6), but the FastSpeech2 voice had an overall quality of MOS=3.2, which was significantly higher than that of the MaryTTS synthesized recordings (MOS=2.0). We intend to use the FastSpeech2 model to create language learning tools for community members both on the Cook Islands and in the diaspora.},
	urldate = {2025-05-05},
	booktitle = {Proceedings of the 2024 {Joint} {International} {Conference} on {Computational} {Linguistics}, {Language} {Resources} and {Evaluation} ({LREC}-{COLING} 2024)},
	publisher = {ELRA and ICCL},
	author = {James, Jesin and Coto-Solano, Rolando and Nicholas, Sally Akevai and Zhu, Joshua and Yu, Bovey and Babasaki, Fuki and Wang, Jenny Tyler and Derby, Nicholas},
	editor = {Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen},
	month = may,
	year = {2024},
	pages = {4820--4831},
}

Downloads: 0

{"_id":"EtKAdRT4vZqdMrjWp","bibbaseid":"james-cotosolano-nicholas-zhu-yu-babasaki-wang-derby-developmentofcommunityorientedtexttospeechmodelsformoriavaikinuicookislandsmori-2024","author_short":["James, J.","Coto-Solano, R.","Nicholas, S. A.","Zhu, J.","Yu, B.","Babasaki, F.","Wang, J. T.","Derby, N."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","address":"Torino, Italia","title":"Development of Community-Oriented Text-to-Speech Models for Māori ‘Avaiki Nui (Cook Islands Māori)","url":"https://aclanthology.org/2024.lrec-main.432/","abstract":"In this paper we describe the development of a text-to-speech system for Māori ‘Avaiki Nui (Cook Islands Māori). We provide details about the process of community-collaboration that was followed throughout the project, a continued engagement where we are trying to develop speech and language technology for the benefit of the community. During this process we gathered a group of recordings that we used to train a TTS system. When training we used two approaches, the HMM-system MaryTTS (Schröder et al., 2011) and the deep learning system FastSpeech2 (Ren et al., 2020). We performed two evaluation tasks on the models: First, we measured their quality by having the synthesized speech transcribed by ASR. The human produced ground truth had lower error rates (CER=4.3, WER=18), but the FastSpeech2 audio has lower error rates (CER=11.8 and WER=42.7) than the MaryTTS voice (CER=17.9 and WER=48.1). The second evaluation was a survey amongst speakers of the language so they could judge the voice`s quality. The ground truth was rated with the highest quality (MOS=4.6), but the FastSpeech2 voice had an overall quality of MOS=3.2, which was significantly higher than that of the MaryTTS synthesized recordings (MOS=2.0). We intend to use the FastSpeech2 model to create language learning tools for community members both on the Cook Islands and in the diaspora.","urldate":"2025-05-05","booktitle":"Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)","publisher":"ELRA and ICCL","author":[{"propositions":[],"lastnames":["James"],"firstnames":["Jesin"],"suffixes":[]},{"propositions":[],"lastnames":["Coto-Solano"],"firstnames":["Rolando"],"suffixes":[]},{"propositions":[],"lastnames":["Nicholas"],"firstnames":["Sally","Akevai"],"suffixes":[]},{"propositions":[],"lastnames":["Zhu"],"firstnames":["Joshua"],"suffixes":[]},{"propositions":[],"lastnames":["Yu"],"firstnames":["Bovey"],"suffixes":[]},{"propositions":[],"lastnames":["Babasaki"],"firstnames":["Fuki"],"suffixes":[]},{"propositions":[],"lastnames":["Wang"],"firstnames":["Jenny","Tyler"],"suffixes":[]},{"propositions":[],"lastnames":["Derby"],"firstnames":["Nicholas"],"suffixes":[]}],"editor":[{"propositions":[],"lastnames":["Calzolari"],"firstnames":["Nicoletta"],"suffixes":[]},{"propositions":[],"lastnames":["Kan"],"firstnames":["Min-Yen"],"suffixes":[]},{"propositions":[],"lastnames":["Hoste"],"firstnames":["Veronique"],"suffixes":[]},{"propositions":[],"lastnames":["Lenci"],"firstnames":["Alessandro"],"suffixes":[]},{"propositions":[],"lastnames":["Sakti"],"firstnames":["Sakriani"],"suffixes":[]},{"propositions":[],"lastnames":["Xue"],"firstnames":["Nianwen"],"suffixes":[]}],"month":"May","year":"2024","pages":"4820–4831","bibtex":"@inproceedings{james_development_2024,\n\taddress = {Torino, Italia},\n\ttitle = {Development of {Community}-{Oriented} {Text}-to-{Speech} {Models} for {Māori} ‘{Avaiki} {Nui} ({Cook} {Islands} {Māori})},\n\turl = {https://aclanthology.org/2024.lrec-main.432/},\n\tabstract = {In this paper we describe the development of a text-to-speech system for Māori ‘Avaiki Nui (Cook Islands Māori). We provide details about the process of community-collaboration that was followed throughout the project, a continued engagement where we are trying to develop speech and language technology for the benefit of the community. During this process we gathered a group of recordings that we used to train a TTS system. When training we used two approaches, the HMM-system MaryTTS (Schröder et al., 2011) and the deep learning system FastSpeech2 (Ren et al., 2020). We performed two evaluation tasks on the models: First, we measured their quality by having the synthesized speech transcribed by ASR. The human produced ground truth had lower error rates (CER=4.3, WER=18), but the FastSpeech2 audio has lower error rates (CER=11.8 and WER=42.7) than the MaryTTS voice (CER=17.9 and WER=48.1). The second evaluation was a survey amongst speakers of the language so they could judge the voice`s quality. The ground truth was rated with the highest quality (MOS=4.6), but the FastSpeech2 voice had an overall quality of MOS=3.2, which was significantly higher than that of the MaryTTS synthesized recordings (MOS=2.0). We intend to use the FastSpeech2 model to create language learning tools for community members both on the Cook Islands and in the diaspora.},\n\turldate = {2025-05-05},\n\tbooktitle = {Proceedings of the 2024 {Joint} {International} {Conference} on {Computational} {Linguistics}, {Language} {Resources} and {Evaluation} ({LREC}-{COLING} 2024)},\n\tpublisher = {ELRA and ICCL},\n\tauthor = {James, Jesin and Coto-Solano, Rolando and Nicholas, Sally Akevai and Zhu, Joshua and Yu, Bovey and Babasaki, Fuki and Wang, Jenny Tyler and Derby, Nicholas},\n\teditor = {Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen},\n\tmonth = may,\n\tyear = {2024},\n\tpages = {4820--4831},\n}\n\n\n\n\n\n\n\n","author_short":["James, J.","Coto-Solano, R.","Nicholas, S. A.","Zhu, J.","Yu, B.","Babasaki, F.","Wang, J. T.","Derby, N."],"editor_short":["Calzolari, N.","Kan, M.","Hoste, V.","Lenci, A.","Sakti, S.","Xue, N."],"key":"james_development_2024","id":"james_development_2024","bibbaseid":"james-cotosolano-nicholas-zhu-yu-babasaki-wang-derby-developmentofcommunityorientedtexttospeechmodelsformoriavaikinuicookislandsmori-2024","role":"author","urls":{"Paper":"https://aclanthology.org/2024.lrec-main.432/"},"metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"https://bibbase.org/zotero-group/ellisoro/5928467","dataSources":["Te8SdehucXdpszE2L","wXoFS5i4dwTWFakpp"],"keywords":[],"search_terms":["development","community","oriented","text","speech","models","ori","avaiki","nui","cook","islands","ori","james","coto-solano","nicholas","zhu","yu","babasaki","wang","derby"],"title":"Development of Community-Oriented Text-to-Speech Models for Māori ‘Avaiki Nui (Cook Islands Māori)","year":2024}