Grapheme-to-Phoneme Conversion with Convolutional Neural Networks

Grapheme-to-Phoneme Conversion with Convolutional Neural Networks. Yolchuyeva, S., Németh, G., & Gyires-Tóth, B. Applied Sciences, 9(6):1143, January, 2019. Number: 6 Publisher: Multidisciplinary Digital Publishing Institute

Paper doi abstract bibtex

Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form. It has a highly essential role for natural language processing, text-to-speech synthesis and automatic speech recognition systems. In this paper, we investigate convolutional neural networks (CNN) for G2P conversion. We propose a novel CNN-based sequence-to-sequence (seq2seq) architecture for G2P conversion. Our approach includes an end-to-end CNN G2P conversion with residual connections and, furthermore, a model that utilizes a convolutional neural network (with and without residual connections) as encoder and Bi-LSTM as a decoder. We compare our approach with state-of-the-art methods, including Encoder-Decoder LSTM and Encoder-Decoder Bi-LSTM. Training and inference times, phoneme and word error rates were evaluated on the public CMUDict dataset for US English, and the best performing convolutional neural network-based architecture was also evaluated on the NetTalk dataset. Our method approaches the accuracy of previous state-of-the-art results in terms of phoneme error rate.

@article{yolchuyeva_grapheme--phoneme_2019,
	title = {Grapheme-to-{Phoneme} {Conversion} with {Convolutional} {Neural} {Networks}},
	volume = {9},
	copyright = {http://creativecommons.org/licenses/by/3.0/},
	issn = {2076-3417},
	url = {https://www.mdpi.com/2076-3417/9/6/1143},
	doi = {10.3390/app9061143},
	abstract = {Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form. It has a highly essential role for natural language processing, text-to-speech synthesis and automatic speech recognition systems. In this paper, we investigate convolutional neural networks (CNN) for G2P conversion. We propose a novel CNN-based sequence-to-sequence (seq2seq) architecture for G2P conversion. Our approach includes an end-to-end CNN G2P conversion with residual connections and, furthermore, a model that utilizes a convolutional neural network (with and without residual connections) as encoder and Bi-LSTM as a decoder. We compare our approach with state-of-the-art methods, including Encoder-Decoder LSTM and Encoder-Decoder Bi-LSTM. Training and inference times, phoneme and word error rates were evaluated on the public CMUDict dataset for US English, and the best performing convolutional neural network-based architecture was also evaluated on the NetTalk dataset. Our method approaches the accuracy of previous state-of-the-art results in terms of phoneme error rate.},
	language = {en},
	number = {6},
	urldate = {2022-10-08},
	journal = {Applied Sciences},
	author = {Yolchuyeva, Sevinj and Németh, Géza and Gyires-Tóth, Bálint},
	month = jan,
	year = {2019},
	note = {Number: 6
Publisher: Multidisciplinary Digital Publishing Institute},
	keywords = {1D convolution, Bi-LSTM, LSTM, encoder-decoder, grapheme-to-phoneme (G2P), residual architecture},
	pages = {1143},
}

Downloads: 0

{"_id":"CoHwXK4aDuoyDts5X","bibbaseid":"yolchuyeva-nmeth-gyirestth-graphemetophonemeconversionwithconvolutionalneuralnetworks-2019","author_short":["Yolchuyeva, S.","Németh, G.","Gyires-Tóth, B."],"bibdata":{"bibtype":"article","type":"article","title":"Grapheme-to-Phoneme Conversion with Convolutional Neural Networks","volume":"9","copyright":"http://creativecommons.org/licenses/by/3.0/","issn":"2076-3417","url":"https://www.mdpi.com/2076-3417/9/6/1143","doi":"10.3390/app9061143","abstract":"Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form. It has a highly essential role for natural language processing, text-to-speech synthesis and automatic speech recognition systems. In this paper, we investigate convolutional neural networks (CNN) for G2P conversion. We propose a novel CNN-based sequence-to-sequence (seq2seq) architecture for G2P conversion. Our approach includes an end-to-end CNN G2P conversion with residual connections and, furthermore, a model that utilizes a convolutional neural network (with and without residual connections) as encoder and Bi-LSTM as a decoder. We compare our approach with state-of-the-art methods, including Encoder-Decoder LSTM and Encoder-Decoder Bi-LSTM. Training and inference times, phoneme and word error rates were evaluated on the public CMUDict dataset for US English, and the best performing convolutional neural network-based architecture was also evaluated on the NetTalk dataset. Our method approaches the accuracy of previous state-of-the-art results in terms of phoneme error rate.","language":"en","number":"6","urldate":"2022-10-08","journal":"Applied Sciences","author":[{"propositions":[],"lastnames":["Yolchuyeva"],"firstnames":["Sevinj"],"suffixes":[]},{"propositions":[],"lastnames":["Németh"],"firstnames":["Géza"],"suffixes":[]},{"propositions":[],"lastnames":["Gyires-Tóth"],"firstnames":["Bálint"],"suffixes":[]}],"month":"January","year":"2019","note":"Number: 6 Publisher: Multidisciplinary Digital Publishing Institute","keywords":"1D convolution, Bi-LSTM, LSTM, encoder-decoder, grapheme-to-phoneme (G2P), residual architecture","pages":"1143","bibtex":"@article{yolchuyeva_grapheme--phoneme_2019,\n\ttitle = {Grapheme-to-{Phoneme} {Conversion} with {Convolutional} {Neural} {Networks}},\n\tvolume = {9},\n\tcopyright = {http://creativecommons.org/licenses/by/3.0/},\n\tissn = {2076-3417},\n\turl = {https://www.mdpi.com/2076-3417/9/6/1143},\n\tdoi = {10.3390/app9061143},\n\tabstract = {Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form. It has a highly essential role for natural language processing, text-to-speech synthesis and automatic speech recognition systems. In this paper, we investigate convolutional neural networks (CNN) for G2P conversion. We propose a novel CNN-based sequence-to-sequence (seq2seq) architecture for G2P conversion. Our approach includes an end-to-end CNN G2P conversion with residual connections and, furthermore, a model that utilizes a convolutional neural network (with and without residual connections) as encoder and Bi-LSTM as a decoder. We compare our approach with state-of-the-art methods, including Encoder-Decoder LSTM and Encoder-Decoder Bi-LSTM. Training and inference times, phoneme and word error rates were evaluated on the public CMUDict dataset for US English, and the best performing convolutional neural network-based architecture was also evaluated on the NetTalk dataset. Our method approaches the accuracy of previous state-of-the-art results in terms of phoneme error rate.},\n\tlanguage = {en},\n\tnumber = {6},\n\turldate = {2022-10-08},\n\tjournal = {Applied Sciences},\n\tauthor = {Yolchuyeva, Sevinj and Németh, Géza and Gyires-Tóth, Bálint},\n\tmonth = jan,\n\tyear = {2019},\n\tnote = {Number: 6\nPublisher: Multidisciplinary Digital Publishing Institute},\n\tkeywords = {1D convolution, Bi-LSTM, LSTM, encoder-decoder, grapheme-to-phoneme (G2P), residual architecture},\n\tpages = {1143},\n}\n\n","author_short":["Yolchuyeva, S.","Németh, G.","Gyires-Tóth, B."],"key":"yolchuyeva_grapheme--phoneme_2019","id":"yolchuyeva_grapheme--phoneme_2019","bibbaseid":"yolchuyeva-nmeth-gyirestth-graphemetophonemeconversionwithconvolutionalneuralnetworks-2019","role":"author","urls":{"Paper":"https://www.mdpi.com/2076-3417/9/6/1143"},"keyword":["1D convolution","Bi-LSTM","LSTM","encoder-decoder","grapheme-to-phoneme (G2P)","residual architecture"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"article","biburl":"https://bibbase.org/zotero/ValentinDRichard","dataSources":["nTukRxsNNmbKgjGgn"],"keywords":["1d convolution","bi-lstm","lstm","encoder-decoder","grapheme-to-phoneme (g2p)","residual architecture"],"search_terms":["grapheme","phoneme","conversion","convolutional","neural","networks","yolchuyeva","németh","gyires-tóth"],"title":"Grapheme-to-Phoneme Conversion with Convolutional Neural Networks","year":2019}