Combined gesture-speech analysis and speech driven gesture synthesis

Combined gesture-speech analysis and speech driven gesture synthesis. Sargin, M. E., Aran, O., Karpov, A., Ofli, F., Yasinnik, Y., Wilson, S., Erzin, E., Yemez, Y., & Tekalp, A. M. In 2006 IEEE International Conference on Multimedia and Expo - ICME 2006, Vols 1-5, Proceedings, pages 893-896, 2006. IEEE; IEEE Circuits & Syst Soc; IEEE Commun Soc; IEEE Comp Soc; IEEE Signal Proc Soc. IEEE International Conference on Multimedia and Expo (ICME 2006), Toronto, CANADA, JUL 09-12, 2006
doi abstract bibtex

Multimodal speech and speaker modeling and recognition are widely accepted as vital aspects of state of the art human-machine interaction systems. While correlations between speech and lip motion as well as speech and facial expressions are widely studied, relatively little work has been done to investigate the correlations between speech and gesture. Detection and modeling of head, hand and arm gestures of a speaker have been studied extensively and these gestures were shown to carry linguistic information. A typical example is the head gesture while saying ``yes/no''. In this study, correlation between gestures and speech is investigated. In speech signal analysis, keyword spotting and prosodic accent event detection has been performed. In gesture analysis, hand positions and parameters of global head motion arc used as features. The detection of gestures is based on discrete pre-designated symbol sets, which are manually labeled during the training phase. The gesture-speech correlation is modelled by examining the co-occurring speech and gesture patterns. This correlation can be used to fuse gesture and speech modalities for edutainment applications (i.e. video games, 3-D animations) where natural gestures of talking avatars is animated from speech. A speech driven gesture animation example has been implemented for demonstration.

@inproceedings{ ISI:000245384802004,
Author = {Sargin, M. E. and Aran, O. and Karpov, A. and Ofli, F. and Yasinnik, Y.
   and Wilson, S. and Erzin, E. and Yemez, Y. and Tekalp, A. M.},
Book-Group-Author = {{IEEE}},
Title = {{Combined gesture-speech analysis and speech driven gesture synthesis}},
Booktitle = {{2006 IEEE International Conference on Multimedia and Expo - ICME 2006,
   Vols 1-5, Proceedings}},
Year = {{2006}},
Pages = {{893-896}},
Note = {{IEEE International Conference on Multimedia and Expo (ICME 2006),
   Toronto, CANADA, JUL 09-12, 2006}},
Organization = {{IEEE; IEEE Circuits \& Syst Soc; IEEE Commun Soc; IEEE Comp Soc; IEEE
   Signal Proc Soc}},
Abstract = {{Multimodal speech and speaker modeling and recognition are widely
   accepted as vital aspects of state of the art human-machine interaction
   systems. While correlations between speech and lip motion as well as
   speech and facial expressions are widely studied, relatively little work
   has been done to investigate the correlations between speech and
   gesture.
   Detection and modeling of head, hand and arm gestures of a speaker have
   been studied extensively and these gestures were shown to carry
   linguistic information. A typical example is the head gesture while
   saying ``yes/no{''}. In this study, correlation between gestures and
   speech is investigated. In speech signal analysis, keyword spotting and
   prosodic accent event detection has been performed. In gesture analysis,
   hand positions and parameters of global head motion arc used as
   features. The detection of gestures is based on discrete pre-designated
   symbol sets, which are manually labeled during the training phase. The
   gesture-speech correlation is modelled by examining the co-occurring
   speech and gesture patterns. This correlation can be used to fuse
   gesture and speech modalities for edutainment applications (i.e. video
   games, 3-D animations) where natural gestures of talking avatars is
   animated from speech. A speech driven gesture animation example has been
   implemented for demonstration.}},
DOI = {{10.1109/ICME.2006.262663}},
ISBN = {{978-1-4244-0366-0}},
ResearcherID-Numbers = {{Erzin, Engin/H-1716-2011
   Karpov, Alexey/A-8905-2012}},
ORCID-Numbers = {{Erzin, Engin/0000-0002-2715-2368
   Karpov, Alexey/0000-0003-3424-652X}},
Unique-ID = {{ISI:000245384802004}},
}

Downloads: 0

{"_id":"fnimcxgiaQReH2NPD","bibbaseid":"sargin-aran-karpov-ofli-yasinnik-wilson-erzin-yemez-etal-combinedgesturespeechanalysisandspeechdrivengesturesynthesis-2006","downloads":0,"creationDate":"2015-12-09T21:23:15.302Z","title":"Combined gesture-speech analysis and speech driven gesture synthesis","author_short":["Sargin, M. E.","Aran, O.","Karpov, A.","Ofli, F.","Yasinnik, Y.","Wilson, S.","Erzin, E.","Yemez, Y.","Tekalp, A. M."],"year":2006,"bibtype":"inproceedings","biburl":"https://bibbase.org/network/files/CiHbjYG79MLcGLJka","bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"propositions":[],"lastnames":["Sargin"],"firstnames":["M.","E."],"suffixes":[]},{"propositions":[],"lastnames":["Aran"],"firstnames":["O."],"suffixes":[]},{"propositions":[],"lastnames":["Karpov"],"firstnames":["A."],"suffixes":[]},{"propositions":[],"lastnames":["Ofli"],"firstnames":["F."],"suffixes":[]},{"propositions":[],"lastnames":["Yasinnik"],"firstnames":["Y."],"suffixes":[]},{"propositions":[],"lastnames":["Wilson"],"firstnames":["S."],"suffixes":[]},{"propositions":[],"lastnames":["Erzin"],"firstnames":["E."],"suffixes":[]},{"propositions":[],"lastnames":["Yemez"],"firstnames":["Y."],"suffixes":[]},{"propositions":[],"lastnames":["Tekalp"],"firstnames":["A.","M."],"suffixes":[]}],"book-group-author":"IEEE","title":"Combined gesture-speech analysis and speech driven gesture synthesis","booktitle":"2006 IEEE International Conference on Multimedia and Expo - ICME 2006, Vols 1-5, Proceedings","year":"2006","pages":"893-896","note":"IEEE International Conference on Multimedia and Expo (ICME 2006), Toronto, CANADA, JUL 09-12, 2006","organization":"IEEE; IEEE Circuits & Syst Soc; IEEE Commun Soc; IEEE Comp Soc; IEEE Signal Proc Soc","abstract":"Multimodal speech and speaker modeling and recognition are widely accepted as vital aspects of state of the art human-machine interaction systems. While correlations between speech and lip motion as well as speech and facial expressions are widely studied, relatively little work has been done to investigate the correlations between speech and gesture. Detection and modeling of head, hand and arm gestures of a speaker have been studied extensively and these gestures were shown to carry linguistic information. A typical example is the head gesture while saying ``yes/no''. In this study, correlation between gestures and speech is investigated. In speech signal analysis, keyword spotting and prosodic accent event detection has been performed. In gesture analysis, hand positions and parameters of global head motion arc used as features. The detection of gestures is based on discrete pre-designated symbol sets, which are manually labeled during the training phase. The gesture-speech correlation is modelled by examining the co-occurring speech and gesture patterns. This correlation can be used to fuse gesture and speech modalities for edutainment applications (i.e. video games, 3-D animations) where natural gestures of talking avatars is animated from speech. A speech driven gesture animation example has been implemented for demonstration.","doi":"10.1109/ICME.2006.262663","isbn":"978-1-4244-0366-0","researcherid-numbers":"Erzin, Engin/H-1716-2011 Karpov, Alexey/A-8905-2012","orcid-numbers":"Erzin, Engin/0000-0002-2715-2368 Karpov, Alexey/0000-0003-3424-652X","unique-id":"ISI:000245384802004","bibtex":"@inproceedings{ ISI:000245384802004,\nAuthor = {Sargin, M. E. and Aran, O. and Karpov, A. and Ofli, F. and Yasinnik, Y.\n and Wilson, S. and Erzin, E. and Yemez, Y. and Tekalp, A. M.},\nBook-Group-Author = {{IEEE}},\nTitle = {{Combined gesture-speech analysis and speech driven gesture synthesis}},\nBooktitle = {{2006 IEEE International Conference on Multimedia and Expo - ICME 2006,\n Vols 1-5, Proceedings}},\nYear = {{2006}},\nPages = {{893-896}},\nNote = {{IEEE International Conference on Multimedia and Expo (ICME 2006),\n Toronto, CANADA, JUL 09-12, 2006}},\nOrganization = {{IEEE; IEEE Circuits \\& Syst Soc; IEEE Commun Soc; IEEE Comp Soc; IEEE\n Signal Proc Soc}},\nAbstract = {{Multimodal speech and speaker modeling and recognition are widely\n accepted as vital aspects of state of the art human-machine interaction\n systems. While correlations between speech and lip motion as well as\n speech and facial expressions are widely studied, relatively little work\n has been done to investigate the correlations between speech and\n gesture.\n Detection and modeling of head, hand and arm gestures of a speaker have\n been studied extensively and these gestures were shown to carry\n linguistic information. A typical example is the head gesture while\n saying ``yes/no{''}. In this study, correlation between gestures and\n speech is investigated. In speech signal analysis, keyword spotting and\n prosodic accent event detection has been performed. In gesture analysis,\n hand positions and parameters of global head motion arc used as\n features. The detection of gestures is based on discrete pre-designated\n symbol sets, which are manually labeled during the training phase. The\n gesture-speech correlation is modelled by examining the co-occurring\n speech and gesture patterns. This correlation can be used to fuse\n gesture and speech modalities for edutainment applications (i.e. video\n games, 3-D animations) where natural gestures of talking avatars is\n animated from speech. A speech driven gesture animation example has been\n implemented for demonstration.}},\nDOI = {{10.1109/ICME.2006.262663}},\nISBN = {{978-1-4244-0366-0}},\nResearcherID-Numbers = {{Erzin, Engin/H-1716-2011\n Karpov, Alexey/A-8905-2012}},\nORCID-Numbers = {{Erzin, Engin/0000-0002-2715-2368\n Karpov, Alexey/0000-0003-3424-652X}},\nUnique-ID = {{ISI:000245384802004}},\n}\n\n","author_short":["Sargin, M. E.","Aran, O.","Karpov, A.","Ofli, F.","Yasinnik, Y.","Wilson, S.","Erzin, E.","Yemez, Y.","Tekalp, A. M."],"key":"ISI:000245384802004","id":"ISI:000245384802004","bibbaseid":"sargin-aran-karpov-ofli-yasinnik-wilson-erzin-yemez-etal-combinedgesturespeechanalysisandspeechdrivengesturesynthesis-2006","role":"author","urls":{},"metadata":{"authorlinks":{"erzin, e":"http://home.ku.edu.tr/~eerzin/pubs/index6.html"}},"downloads":0},"search_terms":["combined","gesture","speech","analysis","speech","driven","gesture","synthesis","sargin","aran","karpov","ofli","yasinnik","wilson","erzin","yemez","tekalp"],"keywords":[],"authorIDs":["56689bc2b3110c264a000354","566927fe71adeb5a05000063","s4rze5RZET4EY5wXY"],"dataSources":["eoMYcQtZLjtLCGT3K","YTpRk266Eztyjz8mi"]}