Multimodal Speech Processing Using Asynchronous Hidden Markov Models. Bengio, S. Information Fusion, 5(2):81–89, 2004. Paper abstract bibtex This paper advocates that for some multimodal tasks involving more than one stream of data representing the same sequence of events, it might sometimes be a good idea to be able to \em desynchronize the streams in order to maximize their joint likelihood. We thus present a novel Hidden Markov Model architecture to model the joint probability of pairs of asynchronous sequences describing the same sequence of events. An Expectation-Maximization algorithm to train the model is presented, as well as a Viterbi decoding algorithm, which can be used to obtain the optimal state sequence as well as the alignment between the two sequences. The model was tested on two audio-visual speech processing tasks, namely speech recognition and text-dependent speaker verification, both using the M2VTS database. Robust performances under various noise conditions were obtained in both cases.
@article{bengio:2004:if,
author = {S. Bengio},
title = {Multimodal Speech Processing Using Asynchronous Hidden Markov Models},
journal = {Information Fusion},
volume = 5,
number = 2,
pages = {81--89},
year = 2004,
url = {publications/ps/bengio_2004_if.ps.gz},
pdf = {publications/pdf/bengio_2004_if.pdf},
djvu = {publications/djvu/bengio_2004_if.djvu},
original = {2004/ahmm_if/IF02B04-RSP},
web = {http://dx.doi.org/10.1016/j.inffus.2003.04.001},
topics = {speech,multimodal,biometric_authentication},
abstract = {This paper advocates that for some multimodal tasks involving more than one stream of data representing the same sequence of events, it might sometimes be a good idea to be able to {\em desynchronize} the streams in order to maximize their joint likelihood. We thus present a novel Hidden Markov Model architecture to model the joint probability of pairs of asynchronous sequences describing the same sequence of events. An Expectation-Maximization algorithm to train the model is presented, as well as a Viterbi decoding algorithm, which can be used to obtain the optimal state sequence as well as the alignment between the two sequences. The model was tested on two audio-visual speech processing tasks, namely speech recognition and text-dependent speaker verification, both using the M2VTS database. Robust performances under various noise conditions were obtained in both cases.},
categorie = {A}
}
Downloads: 0
{"_id":"esvYE6iiRBWM2uHyZ","bibbaseid":"bengio-multimodalspeechprocessingusingasynchronoushiddenmarkovmodels-2004","authorIDs":[],"author_short":["Bengio, S."],"bibdata":{"bibtype":"article","type":"article","author":[{"firstnames":["S."],"propositions":[],"lastnames":["Bengio"],"suffixes":[]}],"title":"Multimodal Speech Processing Using Asynchronous Hidden Markov Models","journal":"Information Fusion","volume":"5","number":"2","pages":"81–89","year":"2004","url":"publications/ps/bengio_2004_if.ps.gz","pdf":"publications/pdf/bengio_2004_if.pdf","djvu":"publications/djvu/bengio_2004_if.djvu","original":"2004/ahmm_if/IF02B04-RSP","web":"http://dx.doi.org/10.1016/j.inffus.2003.04.001","topics":"speech,multimodal,biometric_authentication","abstract":"This paper advocates that for some multimodal tasks involving more than one stream of data representing the same sequence of events, it might sometimes be a good idea to be able to \\em desynchronize the streams in order to maximize their joint likelihood. We thus present a novel Hidden Markov Model architecture to model the joint probability of pairs of asynchronous sequences describing the same sequence of events. An Expectation-Maximization algorithm to train the model is presented, as well as a Viterbi decoding algorithm, which can be used to obtain the optimal state sequence as well as the alignment between the two sequences. The model was tested on two audio-visual speech processing tasks, namely speech recognition and text-dependent speaker verification, both using the M2VTS database. Robust performances under various noise conditions were obtained in both cases.","categorie":"A","bibtex":"@article{bengio:2004:if,\n author = {S. Bengio},\n title = {Multimodal Speech Processing Using Asynchronous Hidden Markov Models},\n journal = {Information Fusion},\n volume = 5,\n number = 2,\n pages = {81--89},\n year = 2004,\n url = {publications/ps/bengio_2004_if.ps.gz},\n pdf = {publications/pdf/bengio_2004_if.pdf},\n djvu = {publications/djvu/bengio_2004_if.djvu},\n original = {2004/ahmm_if/IF02B04-RSP},\n web = {http://dx.doi.org/10.1016/j.inffus.2003.04.001},\n topics = {speech,multimodal,biometric_authentication},\n abstract = {This paper advocates that for some multimodal tasks involving more than one stream of data representing the same sequence of events, it might sometimes be a good idea to be able to {\\em desynchronize} the streams in order to maximize their joint likelihood. We thus present a novel Hidden Markov Model architecture to model the joint probability of pairs of asynchronous sequences describing the same sequence of events. An Expectation-Maximization algorithm to train the model is presented, as well as a Viterbi decoding algorithm, which can be used to obtain the optimal state sequence as well as the alignment between the two sequences. The model was tested on two audio-visual speech processing tasks, namely speech recognition and text-dependent speaker verification, both using the M2VTS database. Robust performances under various noise conditions were obtained in both cases.},\n categorie = {A}\n}\n\n","author_short":["Bengio, S."],"key":"bengio:2004:if","id":"bengio:2004:if","bibbaseid":"bengio-multimodalspeechprocessingusingasynchronoushiddenmarkovmodels-2004","role":"author","urls":{"Paper":"http://bengio.abracadoudou.com/publications/ps/bengio_2004_if.ps.gz"},"downloads":0},"bibtype":"article","biburl":"http://bengio.abracadoudou.com/samy.bib","creationDate":"2020-03-18T03:43:27.310Z","downloads":0,"keywords":[],"search_terms":["multimodal","speech","processing","using","asynchronous","hidden","markov","models","bengio"],"title":"Multimodal Speech Processing Using Asynchronous Hidden Markov Models","year":2004,"dataSources":["9NCW2CDr4M3s5DvNX"]}