An image-based visual speech animation system. Zhou Z Zhao G, G., Y., &., P., M. IEEE Transactions on Circuits and Systems for Video Technology, 22(10):1420-1432, 2012.
abstract   bibtex   
An image-based visual speech animation system is presented in this paper. A video model is proposed to preserve the video dynamics of a talking face. The model represents a video sequence by a low-dimensional continuous curve embedded in a path graph and establishes a map from the curve to the image domain. When selecting video segments for synthesis, we loosen the traditional requirement of using triphone as the unit to allow segments to contain longer natural talking motion. Dense videos are sampled from the segments, concatenated and downsampled to train a video model which enables efficient time-alignment and motion smoothing for the final video synthesis. Different viseme definitions are used to investigate the impact of visemes on the video realism of the animated talking face. The system is built upon a public database and tested both objectively and subjectively.
@article{
 title = {An image-based visual speech animation system.},
 type = {article},
 year = {2012},
 pages = {1420-1432},
 volume = {22},
 id = {0d443fcc-a28f-34d7-b00c-ece199ff7009},
 created = {2019-11-19T13:00:47.313Z},
 file_attached = {false},
 profile_id = {bddcf02d-403b-3b06-9def-6d15cc293e20},
 group_id = {17585b85-df99-3a34-98c2-c73e593397d7},
 last_modified = {2019-11-19T13:48:37.554Z},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {true},
 hidden = {false},
 citation_key = {mvg:1636},
 source_type = {article},
 private_publication = {false},
 abstract = {An image-based visual speech animation system is presented in this paper. A video model is proposed to preserve the video dynamics of a talking face. The model represents a video sequence by a low-dimensional continuous curve embedded in a path graph and establishes a map from the curve to the image domain. When selecting video segments for synthesis, we loosen the traditional requirement of using triphone as the unit to allow segments to contain longer natural talking motion. Dense videos are sampled from the segments, concatenated and downsampled to train a video model which enables efficient time-alignment and motion smoothing for the final video synthesis. Different viseme definitions are used to investigate the impact of visemes on the video realism of the animated talking face. The system is built upon a public database and tested both objectively and subjectively.},
 bibtype = {article},
 author = {Zhou Z Zhao G, Guo Y & Pietikäinen M},
 journal = {IEEE Transactions on Circuits and Systems for Video Technology},
 number = {10}
}

Downloads: 0