A phone-viseme dynamic Bayesian network for audio-visual automatic speech recognition

A phone-viseme dynamic Bayesian network for audio-visual automatic speech recognition. Terry, L. & Katsaggelos, A. K. In 2008 19th International Conference on Pattern Recognition, pages 1–4, dec, 2008. IEEE.

Paper doi abstract bibtex

This work extends and improves a recently introduced (Dec. 2007) dynamic Bayesian network (DBN) based audio-visual automatic speech recognition (AVASR) system. That system models the audio and visual components of speech as being composed of the same sub-word units when, in fact, this is not psycholinguistically true. We extend the system to model the audio and visual streams as being composed of separate, yet related, sub-word units. We also introduce a novel stream weighting structure incorporated into the model itself In recognition accuracy in a large vocabulary continuous speech recognition task (LVCSR). The "best" performing proposed system attains a WER of 66.71% whereas the "best" baseline system performs at a WER of 64.30%. The proposed system also improves accuracy to 45.95%from 39.40%. © 2008 IEEE.

@inproceedings{Louis2008,
abstract = {This work extends and improves a recently introduced (Dec. 2007) dynamic Bayesian network (DBN) based audio-visual automatic speech recognition (AVASR) system. That system models the audio and visual components of speech as being composed of the same sub-word units when, in fact, this is not psycholinguistically true. We extend the system to model the audio and visual streams as being composed of separate, yet related, sub-word units. We also introduce a novel stream weighting structure incorporated into the model itself In recognition accuracy in a large vocabulary continuous speech recognition task (LVCSR). The "best" performing proposed system attains a WER of 66.71% whereas the "best" baseline system performs at a WER of 64.30%. The proposed system also improves accuracy to 45.95%from 39.40%. {\textcopyright} 2008 IEEE.},
author = {Terry, Louis and Katsaggelos, Aggelos K.},
booktitle = {2008 19th International Conference on Pattern Recognition},
doi = {10.1109/ICPR.2008.4761927},
isbn = {978-1-4244-2174-9},
issn = {1051-4651},
month = {dec},
pages = {1--4},
publisher = {IEEE},
title = {{A phone-viseme dynamic Bayesian network for audio-visual automatic speech recognition}},
url = {http://ieeexplore.ieee.org/document/4761927/},
year = {2008}
}

Downloads: 0

{"_id":"XzpAoCjRvf3iBoSbe","bibbaseid":"terry-katsaggelos-aphonevisemedynamicbayesiannetworkforaudiovisualautomaticspeechrecognition-2008","author_short":["Terry, L.","Katsaggelos, A. K."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","abstract":"This work extends and improves a recently introduced (Dec. 2007) dynamic Bayesian network (DBN) based audio-visual automatic speech recognition (AVASR) system. That system models the audio and visual components of speech as being composed of the same sub-word units when, in fact, this is not psycholinguistically true. We extend the system to model the audio and visual streams as being composed of separate, yet related, sub-word units. We also introduce a novel stream weighting structure incorporated into the model itself In recognition accuracy in a large vocabulary continuous speech recognition task (LVCSR). The \"best\" performing proposed system attains a WER of 66.71% whereas the \"best\" baseline system performs at a WER of 64.30%. The proposed system also improves accuracy to 45.95%from 39.40%. © 2008 IEEE.","author":[{"propositions":[],"lastnames":["Terry"],"firstnames":["Louis"],"suffixes":[]},{"propositions":[],"lastnames":["Katsaggelos"],"firstnames":["Aggelos","K."],"suffixes":[]}],"booktitle":"2008 19th International Conference on Pattern Recognition","doi":"10.1109/ICPR.2008.4761927","isbn":"978-1-4244-2174-9","issn":"1051-4651","month":"dec","pages":"1–4","publisher":"IEEE","title":"A phone-viseme dynamic Bayesian network for audio-visual automatic speech recognition","url":"http://ieeexplore.ieee.org/document/4761927/","year":"2008","bibtex":"@inproceedings{Louis2008,\nabstract = {This work extends and improves a recently introduced (Dec. 2007) dynamic Bayesian network (DBN) based audio-visual automatic speech recognition (AVASR) system. That system models the audio and visual components of speech as being composed of the same sub-word units when, in fact, this is not psycholinguistically true. We extend the system to model the audio and visual streams as being composed of separate, yet related, sub-word units. We also introduce a novel stream weighting structure incorporated into the model itself In recognition accuracy in a large vocabulary continuous speech recognition task (LVCSR). The \"best\" performing proposed system attains a WER of 66.71% whereas the \"best\" baseline system performs at a WER of 64.30%. The proposed system also improves accuracy to 45.95%from 39.40%. {\\textcopyright} 2008 IEEE.},\nauthor = {Terry, Louis and Katsaggelos, Aggelos K.},\nbooktitle = {2008 19th International Conference on Pattern Recognition},\ndoi = {10.1109/ICPR.2008.4761927},\nisbn = {978-1-4244-2174-9},\nissn = {1051-4651},\nmonth = {dec},\npages = {1--4},\npublisher = {IEEE},\ntitle = {{A phone-viseme dynamic Bayesian network for audio-visual automatic speech recognition}},\nurl = {http://ieeexplore.ieee.org/document/4761927/},\nyear = {2008}\n}\n","author_short":["Terry, L.","Katsaggelos, A. K."],"key":"Louis2008","id":"Louis2008","bibbaseid":"terry-katsaggelos-aphonevisemedynamicbayesiannetworkforaudiovisualautomaticspeechrecognition-2008","role":"author","urls":{"Paper":"http://ieeexplore.ieee.org/document/4761927/"},"metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"https://sites.northwestern.edu/ivpl/files/2023/06/IVPL_Updated_publications-1.bib","dataSources":["KTWAakbPXLGfYseXn","ePKPjG8C6yvpk4mEK","ya2CyA73rpZseyrZ8","qhF8zxmGcJfvtdeAg","fvDEHD49E2ZRwE3fb","H7crv8NWhZup4d4by","DHqokWsryttGh7pJE","vRJd4wNg9HpoZSMHD","sYxQ6pxFgA59JRhxi","w2WahSbYrbcCKBDsC","XasdXLL99y5rygCmq","3gkSihZQRfAD2KBo3","t5XMbyZbtPBo4wBGS","bEpHM2CtrwW2qE8FP","teJzFLHexaz5AQW5z","taz5xnPrcQTmMdtqr"],"keywords":[],"search_terms":["phone","viseme","dynamic","bayesian","network","audio","visual","automatic","speech","recognition","terry","katsaggelos"],"title":"A phone-viseme dynamic Bayesian network for audio-visual automatic speech recognition","year":2008}