Multimodal Speech Processing Using Asynchronous Hidden Markov Models

Multimodal Speech Processing Using Asynchronous Hidden Markov Models. Bengio, S. Information Fusion, 5(2):81–89, 2004.

Paper abstract bibtex

This paper advocates that for some multimodal tasks involving more than one stream of data representing the same sequence of events, it might sometimes be a good idea to be able to \em desynchronize the streams in order to maximize their joint likelihood. We thus present a novel Hidden Markov Model architecture to model the joint probability of pairs of asynchronous sequences describing the same sequence of events. An Expectation-Maximization algorithm to train the model is presented, as well as a Viterbi decoding algorithm, which can be used to obtain the optimal state sequence as well as the alignment between the two sequences. The model was tested on two audio-visual speech processing tasks, namely speech recognition and text-dependent speaker verification, both using the M2VTS database. Robust performances under various noise conditions were obtained in both cases.

@article{bengio:2004:if,
  author = {S. Bengio},
  title = {Multimodal Speech Processing Using Asynchronous Hidden Markov Models},
  journal = {Information Fusion},
  volume = 5,
  number = 2,
  pages = {81--89},
  year = 2004,
  url = {publications/ps/bengio_2004_if.ps.gz},
  pdf = {publications/pdf/bengio_2004_if.pdf},
  djvu = {publications/djvu/bengio_2004_if.djvu},
  original = {2004/ahmm_if/IF02B04-RSP},
  web = {http://dx.doi.org/10.1016/j.inffus.2003.04.001},
  topics = {speech,multimodal,biometric_authentication},
  abstract = {This paper advocates that for some multimodal tasks involving more than one stream of data representing the same sequence of events, it might sometimes be a good idea to be able to {\em desynchronize} the streams in order to maximize their joint likelihood.  We thus present a novel Hidden Markov Model architecture to model the joint probability of pairs of asynchronous sequences describing the same sequence of events.  An Expectation-Maximization algorithm to train the model is presented, as well as a Viterbi decoding algorithm, which can be used to obtain the optimal state sequence as well as the alignment between the two sequences.  The model was tested on two audio-visual speech processing tasks, namely speech recognition and text-dependent speaker verification, both using the M2VTS database. Robust performances under various noise conditions were obtained in both cases.},
  categorie = {A}
}

Downloads: 0

{"_id":"esvYE6iiRBWM2uHyZ","bibbaseid":"bengio-multimodalspeechprocessingusingasynchronoushiddenmarkovmodels-2004","authorIDs":[],"author_short":["Bengio, S."],"bibdata":{"bibtype":"article","type":"article","author":[{"firstnames":["S."],"propositions":[],"lastnames":["Bengio"],"suffixes":[]}],"title":"Multimodal Speech Processing Using Asynchronous Hidden Markov Models","journal":"Information Fusion","volume":"5","number":"2","pages":"81–89","year":"2004","url":"publications/ps/bengio_2004_if.ps.gz","pdf":"publications/pdf/bengio_2004_if.pdf","djvu":"publications/djvu/bengio_2004_if.djvu","original":"2004/ahmm_if/IF02B04-RSP","web":"http://dx.doi.org/10.1016/j.inffus.2003.04.001","topics":"speech,multimodal,biometric_authentication","abstract":"This paper advocates that for some multimodal tasks involving more than one stream of data representing the same sequence of events, it might sometimes be a good idea to be able to \\em desynchronize the streams in order to maximize their joint likelihood. We thus present a novel Hidden Markov Model architecture to model the joint probability of pairs of asynchronous sequences describing the same sequence of events. An Expectation-Maximization algorithm to train the model is presented, as well as a Viterbi decoding algorithm, which can be used to obtain the optimal state sequence as well as the alignment between the two sequences. The model was tested on two audio-visual speech processing tasks, namely speech recognition and text-dependent speaker verification, both using the M2VTS database. Robust performances under various noise conditions were obtained in both cases.","categorie":"A","bibtex":"@article{bengio:2004:if,\n author = {S. Bengio},\n title = {Multimodal Speech Processing Using Asynchronous Hidden Markov Models},\n journal = {Information Fusion},\n volume = 5,\n number = 2,\n pages = {81--89},\n year = 2004,\n url = {publications/ps/bengio_2004_if.ps.gz},\n pdf = {publications/pdf/bengio_2004_if.pdf},\n djvu = {publications/djvu/bengio_2004_if.djvu},\n original = {2004/ahmm_if/IF02B04-RSP},\n web = {http://dx.doi.org/10.1016/j.inffus.2003.04.001},\n topics = {speech,multimodal,biometric_authentication},\n abstract = {This paper advocates that for some multimodal tasks involving more than one stream of data representing the same sequence of events, it might sometimes be a good idea to be able to {\\em desynchronize} the streams in order to maximize their joint likelihood. We thus present a novel Hidden Markov Model architecture to model the joint probability of pairs of asynchronous sequences describing the same sequence of events. An Expectation-Maximization algorithm to train the model is presented, as well as a Viterbi decoding algorithm, which can be used to obtain the optimal state sequence as well as the alignment between the two sequences. The model was tested on two audio-visual speech processing tasks, namely speech recognition and text-dependent speaker verification, both using the M2VTS database. Robust performances under various noise conditions were obtained in both cases.},\n categorie = {A}\n}\n\n","author_short":["Bengio, S."],"key":"bengio:2004:if","id":"bengio:2004:if","bibbaseid":"bengio-multimodalspeechprocessingusingasynchronoushiddenmarkovmodels-2004","role":"author","urls":{"Paper":"http://bengio.abracadoudou.com/publications/ps/bengio_2004_if.ps.gz"},"downloads":0},"bibtype":"article","biburl":"http://bengio.abracadoudou.com/samy.bib","creationDate":"2020-03-18T03:43:27.310Z","downloads":0,"keywords":[],"search_terms":["multimodal","speech","processing","using","asynchronous","hidden","markov","models","bengio"],"title":"Multimodal Speech Processing Using Asynchronous Hidden Markov Models","year":2004,"dataSources":["9NCW2CDr4M3s5DvNX"]}