Speech driven 3D head gesture synthesis. Sargin, M. E., Erzin, E., Yemez, Y., Tekalp, A. M., & Erdem, A. T. In 2006 IEEE 14th Signal Processing and Communications Applications, Vols 1 and 2, pages 237-240, 2006. IEEE. IEEE 14th Signal Processing and Communications Applications, Antalya, TURKEY, APR 16-19, 2006
abstract   bibtex   
In this paper, we present a speech driven natural head gesture analysis and synthesis system. The proposed system assumes that sharp head movements are correlated with prominence in speech. For analysis, a binocular camera system is employed to capture the head motion of a talking person. The motion parameters associated with the 3D head motion are then used for extraction of the repetitive head gestures. In parallel, prosodic events are detected using an HMM structure with pitch and formant frequencies and speech intensity as audio features. For synthesis, the head motion parameters are estimated from the prosodic events based on a gesture-speech correlation model and then the associated Euler angles are used for speech driven animation of a 3D personalized talking head model. Results on head motion feature extraction, prosodic event detection and correlation modelling are provided..
@inproceedings{ ISI:000245347800061,
Author = {Sargin, M. E. and Erzin, E. and Yemez, Y. and Tekalp, A. M. and Erdem,
   A. Tanju},
Book-Group-Author = {{IEEE}},
Title = {{Speech driven 3D head gesture synthesis}},
Booktitle = {{2006 IEEE 14th Signal Processing and Communications Applications, Vols 1
   and 2}},
Year = {{2006}},
Pages = {{237-240}},
Note = {{IEEE 14th Signal Processing and Communications Applications, Antalya,
   TURKEY, APR 16-19, 2006}},
Organization = {{IEEE}},
Abstract = {{In this paper, we present a speech driven natural head gesture analysis
   and synthesis system. The proposed system assumes that sharp head
   movements are correlated with prominence in speech. For analysis, a
   binocular camera system is employed to capture the head motion of a
   talking person. The motion parameters associated with the 3D head motion
   are then used for extraction of the repetitive head gestures. In
   parallel, prosodic events are detected using an HMM structure with pitch
   and formant frequencies and speech intensity as audio features. For
   synthesis, the head motion parameters are estimated from the prosodic
   events based on a gesture-speech correlation model and then the
   associated Euler angles are used for speech driven animation of a 3D
   personalized talking head model. Results on head motion feature
   extraction, prosodic event detection and correlation modelling are
   provided..}},
ISBN = {{978-1-4244-0238-0}},
ResearcherID-Numbers = {{Erzin, Engin/H-1716-2011}},
ORCID-Numbers = {{Erzin, Engin/0000-0002-2715-2368}},
Unique-ID = {{ISI:000245347800061}},
}

Downloads: 0