Pitch prediction from Mel-generalized cepstrum — a computationally efficient pitch modeling approach for speech synthesis. Rao, M. V. A. & Ghosh, P. K. In 2017 25th European Signal Processing Conference (EUSIPCO), pages 1629-1633, Aug, 2017. doi abstract bibtex Text-to-speech (TTS) systems are often used as part of the user interface in wearable devices. Due to limited memory and computational/battery power in wearable devices, it could be useful to have a TTS system which requires less memory and is less computationally intensive. Conventional speech synthesis systems has separate modeling for pitch (FO-model) and spectral representation, namely Mel generalized coefficients (MGC) (MGC-model). In this paper we estimate pitch from the MGC estimated using MGC-model instead of having a separate FO-model. Pitch is obtained from the estimated MGC using a statistical mapping through Gaussian mixture model (GMM). Experiments using CMU-ARCTIC database demonstrate that the proposed GMM based FO-model, even with a single mixture, results in no significant loss in the naturalness of the synthesized speech while the proposed FO-model, in addition to reducing computational complexity, results in 93% reduction in the number of parameters compared to that of the F0-model.
@InProceedings{8081485,
author = {M. V. A. Rao and P. K. Ghosh},
booktitle = {2017 25th European Signal Processing Conference (EUSIPCO)},
title = {Pitch prediction from Mel-generalized cepstrum — a computationally efficient pitch modeling approach for speech synthesis},
year = {2017},
pages = {1629-1633},
abstract = {Text-to-speech (TTS) systems are often used as part of the user interface in wearable devices. Due to limited memory and computational/battery power in wearable devices, it could be useful to have a TTS system which requires less memory and is less computationally intensive. Conventional speech synthesis systems has separate modeling for pitch (FO-model) and spectral representation, namely Mel generalized coefficients (MGC) (MGC-model). In this paper we estimate pitch from the MGC estimated using MGC-model instead of having a separate FO-model. Pitch is obtained from the estimated MGC using a statistical mapping through Gaussian mixture model (GMM). Experiments using CMU-ARCTIC database demonstrate that the proposed GMM based FO-model, even with a single mixture, results in no significant loss in the naturalness of the synthesized speech while the proposed FO-model, in addition to reducing computational complexity, results in ~93% reduction in the number of parameters compared to that of the F0-model.},
keywords = {cepstral analysis;computational complexity;Gaussian processes;mixture models;signal representation;speech processing;speech synthesis;speech-based user interfaces;pitch prediction;Mel-generalized cepstrum;computationally efficient pitch modeling approach;user interface;wearable devices;TTS system;spectral representation;Mel generalized coefficients;MGC-model;estimated MGC;Gaussian mixture model;GMM based FO-model;computational complexity;F0-model;text-to-speech synthesis systems;statistical mapping;CMU-ARCTIC database;pitch estimation;Hidden Markov models;High-temperature superconductors;Speech;Computational modeling;Speech synthesis;Training;Covariance matrices},
doi = {10.23919/EUSIPCO.2017.8081485},
issn = {2076-1465},
month = {Aug},
}
Downloads: 0
{"_id":"i8WkZkZBaEp43DbGG","bibbaseid":"rao-ghosh-pitchpredictionfrommelgeneralizedcepstrumacomputationallyefficientpitchmodelingapproachforspeechsynthesis-2017","authorIDs":[],"author_short":["Rao, M. V. A.","Ghosh, P. K."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["M.","V.","A."],"propositions":[],"lastnames":["Rao"],"suffixes":[]},{"firstnames":["P.","K."],"propositions":[],"lastnames":["Ghosh"],"suffixes":[]}],"booktitle":"2017 25th European Signal Processing Conference (EUSIPCO)","title":"Pitch prediction from Mel-generalized cepstrum — a computationally efficient pitch modeling approach for speech synthesis","year":"2017","pages":"1629-1633","abstract":"Text-to-speech (TTS) systems are often used as part of the user interface in wearable devices. Due to limited memory and computational/battery power in wearable devices, it could be useful to have a TTS system which requires less memory and is less computationally intensive. Conventional speech synthesis systems has separate modeling for pitch (FO-model) and spectral representation, namely Mel generalized coefficients (MGC) (MGC-model). In this paper we estimate pitch from the MGC estimated using MGC-model instead of having a separate FO-model. Pitch is obtained from the estimated MGC using a statistical mapping through Gaussian mixture model (GMM). Experiments using CMU-ARCTIC database demonstrate that the proposed GMM based FO-model, even with a single mixture, results in no significant loss in the naturalness of the synthesized speech while the proposed FO-model, in addition to reducing computational complexity, results in 93% reduction in the number of parameters compared to that of the F0-model.","keywords":"cepstral analysis;computational complexity;Gaussian processes;mixture models;signal representation;speech processing;speech synthesis;speech-based user interfaces;pitch prediction;Mel-generalized cepstrum;computationally efficient pitch modeling approach;user interface;wearable devices;TTS system;spectral representation;Mel generalized coefficients;MGC-model;estimated MGC;Gaussian mixture model;GMM based FO-model;computational complexity;F0-model;text-to-speech synthesis systems;statistical mapping;CMU-ARCTIC database;pitch estimation;Hidden Markov models;High-temperature superconductors;Speech;Computational modeling;Speech synthesis;Training;Covariance matrices","doi":"10.23919/EUSIPCO.2017.8081485","issn":"2076-1465","month":"Aug","bibtex":"@InProceedings{8081485,\n author = {M. V. A. Rao and P. K. Ghosh},\n booktitle = {2017 25th European Signal Processing Conference (EUSIPCO)},\n title = {Pitch prediction from Mel-generalized cepstrum — a computationally efficient pitch modeling approach for speech synthesis},\n year = {2017},\n pages = {1629-1633},\n abstract = {Text-to-speech (TTS) systems are often used as part of the user interface in wearable devices. Due to limited memory and computational/battery power in wearable devices, it could be useful to have a TTS system which requires less memory and is less computationally intensive. Conventional speech synthesis systems has separate modeling for pitch (FO-model) and spectral representation, namely Mel generalized coefficients (MGC) (MGC-model). In this paper we estimate pitch from the MGC estimated using MGC-model instead of having a separate FO-model. Pitch is obtained from the estimated MGC using a statistical mapping through Gaussian mixture model (GMM). Experiments using CMU-ARCTIC database demonstrate that the proposed GMM based FO-model, even with a single mixture, results in no significant loss in the naturalness of the synthesized speech while the proposed FO-model, in addition to reducing computational complexity, results in ~93% reduction in the number of parameters compared to that of the F0-model.},\n keywords = {cepstral analysis;computational complexity;Gaussian processes;mixture models;signal representation;speech processing;speech synthesis;speech-based user interfaces;pitch prediction;Mel-generalized cepstrum;computationally efficient pitch modeling approach;user interface;wearable devices;TTS system;spectral representation;Mel generalized coefficients;MGC-model;estimated MGC;Gaussian mixture model;GMM based FO-model;computational complexity;F0-model;text-to-speech synthesis systems;statistical mapping;CMU-ARCTIC database;pitch estimation;Hidden Markov models;High-temperature superconductors;Speech;Computational modeling;Speech synthesis;Training;Covariance matrices},\n doi = {10.23919/EUSIPCO.2017.8081485},\n issn = {2076-1465},\n month = {Aug},\n}\n\n","author_short":["Rao, M. V. A.","Ghosh, P. K."],"key":"8081485","id":"8081485","bibbaseid":"rao-ghosh-pitchpredictionfrommelgeneralizedcepstrumacomputationallyefficientpitchmodelingapproachforspeechsynthesis-2017","role":"author","urls":{},"keyword":["cepstral analysis;computational complexity;Gaussian processes;mixture models;signal representation;speech processing;speech synthesis;speech-based user interfaces;pitch prediction;Mel-generalized cepstrum;computationally efficient pitch modeling approach;user interface;wearable devices;TTS system;spectral representation;Mel generalized coefficients;MGC-model;estimated MGC;Gaussian mixture model;GMM based FO-model;computational complexity;F0-model;text-to-speech synthesis systems;statistical mapping;CMU-ARCTIC database;pitch estimation;Hidden Markov models;High-temperature superconductors;Speech;Computational modeling;Speech synthesis;Training;Covariance matrices"],"metadata":{"authorlinks":{}},"downloads":0},"bibtype":"inproceedings","biburl":"https://raw.githubusercontent.com/Roznn/EUSIPCO/main/eusipco2017url.bib","creationDate":"2021-02-13T16:38:25.690Z","downloads":0,"keywords":["cepstral analysis;computational complexity;gaussian processes;mixture models;signal representation;speech processing;speech synthesis;speech-based user interfaces;pitch prediction;mel-generalized cepstrum;computationally efficient pitch modeling approach;user interface;wearable devices;tts system;spectral representation;mel generalized coefficients;mgc-model;estimated mgc;gaussian mixture model;gmm based fo-model;computational complexity;f0-model;text-to-speech synthesis systems;statistical mapping;cmu-arctic database;pitch estimation;hidden markov models;high-temperature superconductors;speech;computational modeling;speech synthesis;training;covariance matrices"],"search_terms":["pitch","prediction","mel","generalized","cepstrum","computationally","efficient","pitch","modeling","approach","speech","synthesis","rao","ghosh"],"title":"Pitch prediction from Mel-generalized cepstrum — a computationally efficient pitch modeling approach for speech synthesis","year":2017,"dataSources":["2MNbFYjMYTD6z7ExY","uP2aT6Qs8sfZJ6s8b"]}