Multimodal Speech Processing Using Asynchronous Hidden Markov Models. Bengio, S. *Information Fusion*, 5(2):81–89, 2004.

Paper abstract bibtex

Paper abstract bibtex

This paper advocates that for some multimodal tasks involving more than one stream of data representing the same sequence of events, it might sometimes be a good idea to be able to \em desynchronize the streams in order to maximize their joint likelihood. We thus present a novel Hidden Markov Model architecture to model the joint probability of pairs of asynchronous sequences describing the same sequence of events. An Expectation-Maximization algorithm to train the model is presented, as well as a Viterbi decoding algorithm, which can be used to obtain the optimal state sequence as well as the alignment between the two sequences. The model was tested on two audio-visual speech processing tasks, namely speech recognition and text-dependent speaker verification, both using the M2VTS database. Robust performances under various noise conditions were obtained in both cases.

@article{bengio:2004:if, author = {S. Bengio}, title = {Multimodal Speech Processing Using Asynchronous Hidden Markov Models}, journal = {Information Fusion}, volume = 5, number = 2, pages = {81--89}, year = 2004, url = {publications/ps/bengio_2004_if.ps.gz}, pdf = {publications/pdf/bengio_2004_if.pdf}, djvu = {publications/djvu/bengio_2004_if.djvu}, original = {2004/ahmm_if/IF02B04-RSP}, web = {http://dx.doi.org/10.1016/j.inffus.2003.04.001}, topics = {speech,multimodal,biometric_authentication}, abstract = {This paper advocates that for some multimodal tasks involving more than one stream of data representing the same sequence of events, it might sometimes be a good idea to be able to {\em desynchronize} the streams in order to maximize their joint likelihood. We thus present a novel Hidden Markov Model architecture to model the joint probability of pairs of asynchronous sequences describing the same sequence of events. An Expectation-Maximization algorithm to train the model is presented, as well as a Viterbi decoding algorithm, which can be used to obtain the optimal state sequence as well as the alignment between the two sequences. The model was tested on two audio-visual speech processing tasks, namely speech recognition and text-dependent speaker verification, both using the M2VTS database. Robust performances under various noise conditions were obtained in both cases.}, categorie = {A} }

Downloads: 0

{"_id":"esvYE6iiRBWM2uHyZ","bibbaseid":"bengio-multimodalspeechprocessingusingasynchronoushiddenmarkovmodels-2004","authorIDs":[],"author_short":["Bengio, S."],"bibdata":{"bibtype":"article","type":"article","author":[{"firstnames":["S."],"propositions":[],"lastnames":["Bengio"],"suffixes":[]}],"title":"Multimodal Speech Processing Using Asynchronous Hidden Markov Models","journal":"Information Fusion","volume":"5","number":"2","pages":"81–89","year":"2004","url":"publications/ps/bengio_2004_if.ps.gz","pdf":"publications/pdf/bengio_2004_if.pdf","djvu":"publications/djvu/bengio_2004_if.djvu","original":"2004/ahmm_if/IF02B04-RSP","web":"http://dx.doi.org/10.1016/j.inffus.2003.04.001","topics":"speech,multimodal,biometric_authentication","abstract":"This paper advocates that for some multimodal tasks involving more than one stream of data representing the same sequence of events, it might sometimes be a good idea to be able to \\em desynchronize the streams in order to maximize their joint likelihood. We thus present a novel Hidden Markov Model architecture to model the joint probability of pairs of asynchronous sequences describing the same sequence of events. An Expectation-Maximization algorithm to train the model is presented, as well as a Viterbi decoding algorithm, which can be used to obtain the optimal state sequence as well as the alignment between the two sequences. The model was tested on two audio-visual speech processing tasks, namely speech recognition and text-dependent speaker verification, both using the M2VTS database. Robust performances under various noise conditions were obtained in both cases.","categorie":"A","bibtex":"@article{bengio:2004:if,\n author = {S. Bengio},\n title = {Multimodal Speech Processing Using Asynchronous Hidden Markov Models},\n journal = {Information Fusion},\n volume = 5,\n number = 2,\n pages = {81--89},\n year = 2004,\n url = {publications/ps/bengio_2004_if.ps.gz},\n pdf = {publications/pdf/bengio_2004_if.pdf},\n djvu = {publications/djvu/bengio_2004_if.djvu},\n original = {2004/ahmm_if/IF02B04-RSP},\n web = {http://dx.doi.org/10.1016/j.inffus.2003.04.001},\n topics = {speech,multimodal,biometric_authentication},\n abstract = {This paper advocates that for some multimodal tasks involving more than one stream of data representing the same sequence of events, it might sometimes be a good idea to be able to {\\em desynchronize} the streams in order to maximize their joint likelihood. We thus present a novel Hidden Markov Model architecture to model the joint probability of pairs of asynchronous sequences describing the same sequence of events. An Expectation-Maximization algorithm to train the model is presented, as well as a Viterbi decoding algorithm, which can be used to obtain the optimal state sequence as well as the alignment between the two sequences. The model was tested on two audio-visual speech processing tasks, namely speech recognition and text-dependent speaker verification, both using the M2VTS database. Robust performances under various noise conditions were obtained in both cases.},\n categorie = {A}\n}\n\n","author_short":["Bengio, S."],"key":"bengio:2004:if","id":"bengio:2004:if","bibbaseid":"bengio-multimodalspeechprocessingusingasynchronoushiddenmarkovmodels-2004","role":"author","urls":{"Paper":"http://bengio.abracadoudou.com/publications/ps/bengio_2004_if.ps.gz"},"downloads":0},"bibtype":"article","biburl":"http://bengio.abracadoudou.com/samy.bib","creationDate":"2020-03-18T03:43:27.310Z","downloads":0,"keywords":[],"search_terms":["multimodal","speech","processing","using","asynchronous","hidden","markov","models","bengio"],"title":"Multimodal Speech Processing Using Asynchronous Hidden Markov Models","year":2004,"dataSources":["9NCW2CDr4M3s5DvNX"]}