Improving speech recognition performance via phone-dependent VQ codebooks and adaptive language models in SPHINX-II

Improving speech recognition performance via phone-dependent VQ codebooks and adaptive language models in SPHINX-II. Hwang, M, Rosenfeld, R, Thayer, E, Mosur, R, Chase, L, Weide, R, Huang, X, & Alleva, F In Int'l Conf. on Acoustics, Speech and Signal Processing, April, 1994. Australia.

Paper abstract bibtex

This paper presents improvements in acoustic and language modeling for automatic speech recognition. Specifically, semi-continuous HMMs (SCHMMs) with phone-dependent VQ codebooks are presented and incorporated into the SPHINX-II speech recognition system. The phone-dependent VQ codebooks relax the density-tying constraint in SCHMMs in order to obtain more detailed models. A 6% error rate reduction was achieved on the speaker-independent 20000-word Wall Street Journal (WSJ) task. Dynamic adaptation of the language model in the context of long documents is also explored. A maximum entropy framework is used to exploit long distance trigrams and trigger effects. A 10%-15% word error rate reduction is reported on the same WSJ task using the adaptive language modeling technique

@inproceedings{hwang_improving_1994,
	title = {Improving speech recognition performance via phone-dependent {VQ} codebooks and adaptive language models in {SPHINX}-{II}},
	url = {http://www.cs.cmu.edu/afs/cs.cmu.edu/user/roni/WWW/papers/Hwangetal94.pdf},
	abstract = {This paper presents improvements in acoustic and language modeling for automatic speech recognition. Specifically, semi-continuous HMMs (SCHMMs) with phone-dependent VQ codebooks are presented and incorporated into the SPHINX-II speech recognition system. The phone-dependent VQ codebooks relax the density-tying constraint in SCHMMs in order to obtain more detailed models. A 6\% error rate reduction was achieved on the speaker-independent 20000-word Wall Street Journal (WSJ) task. Dynamic adaptation of the language model in the context of long documents is also explored. A maximum entropy framework is used to exploit long distance trigrams and trigger effects. A 10\%-15\% word error rate reduction is reported on the same WSJ task using the adaptive language modeling technique},
	booktitle = {Int'l {Conf}. on {Acoustics}, {Speech} and {Signal} {Processing}},
	publisher = {Australia},
	author = {Hwang, M and Rosenfeld, R and Thayer, E and Mosur, R and Chase, L and Weide, R and Huang, X and Alleva, F},
	month = apr,
	year = {1994},
	keywords = {Statistical / Machine Learning Methods in Speech and Language Processing},
}

Downloads: 0

{"_id":"P7cunLvkAxtzzhHmw","bibbaseid":"hwang-rosenfeld-thayer-mosur-chase-weide-huang-alleva-improvingspeechrecognitionperformanceviaphonedependentvqcodebooksandadaptivelanguagemodelsinsphinxii-1994","author_short":["Hwang, M","Rosenfeld, R","Thayer, E","Mosur, R","Chase, L","Weide, R","Huang, X","Alleva, F"],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","title":"Improving speech recognition performance via phone-dependent VQ codebooks and adaptive language models in SPHINX-II","url":"http://www.cs.cmu.edu/afs/cs.cmu.edu/user/roni/WWW/papers/Hwangetal94.pdf","abstract":"This paper presents improvements in acoustic and language modeling for automatic speech recognition. Specifically, semi-continuous HMMs (SCHMMs) with phone-dependent VQ codebooks are presented and incorporated into the SPHINX-II speech recognition system. The phone-dependent VQ codebooks relax the density-tying constraint in SCHMMs in order to obtain more detailed models. A 6% error rate reduction was achieved on the speaker-independent 20000-word Wall Street Journal (WSJ) task. Dynamic adaptation of the language model in the context of long documents is also explored. A maximum entropy framework is used to exploit long distance trigrams and trigger effects. A 10%-15% word error rate reduction is reported on the same WSJ task using the adaptive language modeling technique","booktitle":"Int'l Conf. on Acoustics, Speech and Signal Processing","publisher":"Australia","author":[{"propositions":[],"lastnames":["Hwang"],"firstnames":["M"],"suffixes":[]},{"propositions":[],"lastnames":["Rosenfeld"],"firstnames":["R"],"suffixes":[]},{"propositions":[],"lastnames":["Thayer"],"firstnames":["E"],"suffixes":[]},{"propositions":[],"lastnames":["Mosur"],"firstnames":["R"],"suffixes":[]},{"propositions":[],"lastnames":["Chase"],"firstnames":["L"],"suffixes":[]},{"propositions":[],"lastnames":["Weide"],"firstnames":["R"],"suffixes":[]},{"propositions":[],"lastnames":["Huang"],"firstnames":["X"],"suffixes":[]},{"propositions":[],"lastnames":["Alleva"],"firstnames":["F"],"suffixes":[]}],"month":"April","year":"1994","keywords":"Statistical / Machine Learning Methods in Speech and Language Processing","bibtex":"@inproceedings{hwang_improving_1994,\n\ttitle = {Improving speech recognition performance via phone-dependent {VQ} codebooks and adaptive language models in {SPHINX}-{II}},\n\turl = {http://www.cs.cmu.edu/afs/cs.cmu.edu/user/roni/WWW/papers/Hwangetal94.pdf},\n\tabstract = {This paper presents improvements in acoustic and language modeling for automatic speech recognition. Specifically, semi-continuous HMMs (SCHMMs) with phone-dependent VQ codebooks are presented and incorporated into the SPHINX-II speech recognition system. The phone-dependent VQ codebooks relax the density-tying constraint in SCHMMs in order to obtain more detailed models. A 6\\% error rate reduction was achieved on the speaker-independent 20000-word Wall Street Journal (WSJ) task. Dynamic adaptation of the language model in the context of long documents is also explored. A maximum entropy framework is used to exploit long distance trigrams and trigger effects. A 10\\%-15\\% word error rate reduction is reported on the same WSJ task using the adaptive language modeling technique},\n\tbooktitle = {Int'l {Conf}. on {Acoustics}, {Speech} and {Signal} {Processing}},\n\tpublisher = {Australia},\n\tauthor = {Hwang, M and Rosenfeld, R and Thayer, E and Mosur, R and Chase, L and Weide, R and Huang, X and Alleva, F},\n\tmonth = apr,\n\tyear = {1994},\n\tkeywords = {Statistical / Machine Learning Methods in Speech and Language Processing},\n}\n\n","author_short":["Hwang, M","Rosenfeld, R","Thayer, E","Mosur, R","Chase, L","Weide, R","Huang, X","Alleva, F"],"key":"hwang_improving_1994","id":"hwang_improving_1994","bibbaseid":"hwang-rosenfeld-thayer-mosur-chase-weide-huang-alleva-improvingspeechrecognitionperformanceviaphonedependentvqcodebooksandadaptivelanguagemodelsinsphinxii-1994","role":"author","urls":{"Paper":"http://www.cs.cmu.edu/afs/cs.cmu.edu/user/roni/WWW/papers/Hwangetal94.pdf"},"keyword":["Statistical / Machine Learning Methods in Speech and Language Processing"],"metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"https://api.zotero.org/users/5636389/collections/8RG6RK86/items/top?format=bibtex&recursive=1&limit=100&key=lo1KVmBiVRveHF1eNrgQn1PM","dataSources":["v3GnrzT8tq7QRiuNa","HgadyzdqnhGJk6PFG","MohkSYNTEsLoXRdm6","wcpvQhLoP6ekzSPyz","J9ZrKJK3X9wN9ccyh"],"keywords":["statistical / machine learning methods in speech and language processing"],"search_terms":["improving","speech","recognition","performance","via","phone","dependent","codebooks","adaptive","language","models","sphinx","hwang","rosenfeld","thayer","mosur","chase","weide","huang","alleva"],"title":"Improving speech recognition performance via phone-dependent VQ codebooks and adaptive language models in SPHINX-II","year":1994}