Combining Residual Networks with LSTMs for Lipreading. Stafylakis, T. & Tzimiropoulos, G. arXiv:1703.04105 [cs], March, 2017. 00000 arXiv: 1703.04105
Combining Residual Networks with LSTMs for Lipreading [link]Paper  abstract   bibtex   
We propose an end-to-end deep learning architecture for word-level visual speech recognition. The system is a combination of spatiotemporal convolutional, residual and bidirectional Long Short-Term Memory networks. We trained and evaluated it on the Lipreading In-The-Wild benchmark, a challenging database of 500-size vocabulary consisting of video excerpts from BBC TV broadcasts. The proposed network attains word accuracy equal to 83.0%, yielding 6.8% absolute improvement over the current state-of-the-art.
@article{stafylakis_combining_2017,
	title = {Combining {Residual} {Networks} with {LSTMs} for {Lipreading}},
	url = {http://arxiv.org/abs/1703.04105},
	abstract = {We propose an end-to-end deep learning architecture for word-level visual speech recognition. The system is a combination of spatiotemporal convolutional, residual and bidirectional Long Short-Term Memory networks. We trained and evaluated it on the Lipreading In-The-Wild benchmark, a challenging database of 500-size vocabulary consisting of video excerpts from BBC TV broadcasts. The proposed network attains word accuracy equal to 83.0\%, yielding 6.8\% absolute improvement over the current state-of-the-art.},
	urldate = {2017-03-30TZ},
	journal = {arXiv:1703.04105 [cs]},
	author = {Stafylakis, Themos and Tzimiropoulos, Georgios},
	month = mar,
	year = {2017},
	note = {00000 
arXiv: 1703.04105},
	keywords = {Computer Science - Computer Vision and Pattern Recognition, Once}
}

Downloads: 0