Speech driven 3D head gesture synthesis. Sargin, M. E., Erzin, E., Yemez, Y., Tekalp, A. M., & Erdem, A. T. In 2006 IEEE 14th Signal Processing and Communications Applications, Vols 1 and 2, pages 237-240, 2006. IEEE. IEEE 14th Signal Processing and Communications Applications, Antalya, TURKEY, APR 16-19, 2006abstract bibtex In this paper, we present a speech driven natural head gesture analysis and synthesis system. The proposed system assumes that sharp head movements are correlated with prominence in speech. For analysis, a binocular camera system is employed to capture the head motion of a talking person. The motion parameters associated with the 3D head motion are then used for extraction of the repetitive head gestures. In parallel, prosodic events are detected using an HMM structure with pitch and formant frequencies and speech intensity as audio features. For synthesis, the head motion parameters are estimated from the prosodic events based on a gesture-speech correlation model and then the associated Euler angles are used for speech driven animation of a 3D personalized talking head model. Results on head motion feature extraction, prosodic event detection and correlation modelling are provided..
@inproceedings{ ISI:000245347800061,
Author = {Sargin, M. E. and Erzin, E. and Yemez, Y. and Tekalp, A. M. and Erdem,
A. Tanju},
Book-Group-Author = {{IEEE}},
Title = {{Speech driven 3D head gesture synthesis}},
Booktitle = {{2006 IEEE 14th Signal Processing and Communications Applications, Vols 1
and 2}},
Year = {{2006}},
Pages = {{237-240}},
Note = {{IEEE 14th Signal Processing and Communications Applications, Antalya,
TURKEY, APR 16-19, 2006}},
Organization = {{IEEE}},
Abstract = {{In this paper, we present a speech driven natural head gesture analysis
and synthesis system. The proposed system assumes that sharp head
movements are correlated with prominence in speech. For analysis, a
binocular camera system is employed to capture the head motion of a
talking person. The motion parameters associated with the 3D head motion
are then used for extraction of the repetitive head gestures. In
parallel, prosodic events are detected using an HMM structure with pitch
and formant frequencies and speech intensity as audio features. For
synthesis, the head motion parameters are estimated from the prosodic
events based on a gesture-speech correlation model and then the
associated Euler angles are used for speech driven animation of a 3D
personalized talking head model. Results on head motion feature
extraction, prosodic event detection and correlation modelling are
provided..}},
ISBN = {{978-1-4244-0238-0}},
ResearcherID-Numbers = {{Erzin, Engin/H-1716-2011}},
ORCID-Numbers = {{Erzin, Engin/0000-0002-2715-2368}},
Unique-ID = {{ISI:000245347800061}},
}
Downloads: 0
{"_id":"uKDPP7J993hprYpYd","bibbaseid":"sargin-erzin-yemez-tekalp-erdem-speechdriven3dheadgesturesynthesis-2006","downloads":0,"creationDate":"2016-12-09T13:41:59.919Z","title":"Speech driven 3D head gesture synthesis","author_short":["Sargin, M. E.","Erzin, E.","Yemez, Y.","Tekalp, A. M.","Erdem, A. T."],"year":2006,"bibtype":"inproceedings","biburl":"https://drive.google.com/uc?export=download&id=1d5Wvl98W_buJq6prXBz16a_6uF9Cdt-7","bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"propositions":[],"lastnames":["Sargin"],"firstnames":["M.","E."],"suffixes":[]},{"propositions":[],"lastnames":["Erzin"],"firstnames":["E."],"suffixes":[]},{"propositions":[],"lastnames":["Yemez"],"firstnames":["Y."],"suffixes":[]},{"propositions":[],"lastnames":["Tekalp"],"firstnames":["A.","M."],"suffixes":[]},{"propositions":[],"lastnames":["Erdem"],"firstnames":["A.","Tanju"],"suffixes":[]}],"book-group-author":"IEEE","title":"Speech driven 3D head gesture synthesis","booktitle":"2006 IEEE 14th Signal Processing and Communications Applications, Vols 1 and 2","year":"2006","pages":"237-240","note":"IEEE 14th Signal Processing and Communications Applications, Antalya, TURKEY, APR 16-19, 2006","organization":"IEEE","abstract":"In this paper, we present a speech driven natural head gesture analysis and synthesis system. The proposed system assumes that sharp head movements are correlated with prominence in speech. For analysis, a binocular camera system is employed to capture the head motion of a talking person. The motion parameters associated with the 3D head motion are then used for extraction of the repetitive head gestures. In parallel, prosodic events are detected using an HMM structure with pitch and formant frequencies and speech intensity as audio features. For synthesis, the head motion parameters are estimated from the prosodic events based on a gesture-speech correlation model and then the associated Euler angles are used for speech driven animation of a 3D personalized talking head model. Results on head motion feature extraction, prosodic event detection and correlation modelling are provided..","isbn":"978-1-4244-0238-0","researcherid-numbers":"Erzin, Engin/H-1716-2011","orcid-numbers":"Erzin, Engin/0000-0002-2715-2368","unique-id":"ISI:000245347800061","bibtex":"@inproceedings{ ISI:000245347800061,\nAuthor = {Sargin, M. E. and Erzin, E. and Yemez, Y. and Tekalp, A. M. and Erdem,\n A. Tanju},\nBook-Group-Author = {{IEEE}},\nTitle = {{Speech driven 3D head gesture synthesis}},\nBooktitle = {{2006 IEEE 14th Signal Processing and Communications Applications, Vols 1\n and 2}},\nYear = {{2006}},\nPages = {{237-240}},\nNote = {{IEEE 14th Signal Processing and Communications Applications, Antalya,\n TURKEY, APR 16-19, 2006}},\nOrganization = {{IEEE}},\nAbstract = {{In this paper, we present a speech driven natural head gesture analysis\n and synthesis system. The proposed system assumes that sharp head\n movements are correlated with prominence in speech. For analysis, a\n binocular camera system is employed to capture the head motion of a\n talking person. The motion parameters associated with the 3D head motion\n are then used for extraction of the repetitive head gestures. In\n parallel, prosodic events are detected using an HMM structure with pitch\n and formant frequencies and speech intensity as audio features. For\n synthesis, the head motion parameters are estimated from the prosodic\n events based on a gesture-speech correlation model and then the\n associated Euler angles are used for speech driven animation of a 3D\n personalized talking head model. Results on head motion feature\n extraction, prosodic event detection and correlation modelling are\n provided..}},\nISBN = {{978-1-4244-0238-0}},\nResearcherID-Numbers = {{Erzin, Engin/H-1716-2011}},\nORCID-Numbers = {{Erzin, Engin/0000-0002-2715-2368}},\nUnique-ID = {{ISI:000245347800061}},\n}\n\n","author_short":["Sargin, M. E.","Erzin, E.","Yemez, Y.","Tekalp, A. M.","Erdem, A. T."],"key":"ISI:000245347800061","id":"ISI:000245347800061","bibbaseid":"sargin-erzin-yemez-tekalp-erdem-speechdriven3dheadgesturesynthesis-2006","role":"author","urls":{},"metadata":{"authorlinks":{}}},"search_terms":["speech","driven","head","gesture","synthesis","sargin","erzin","yemez","tekalp","erdem"],"keywords":[],"authorIDs":[],"dataSources":["P7SB4qiBxZPhjXYRW","ziFHh7RJJaJNc9iie"]}