Discriminative analysis of lip motion features for speaker identification and speech-reading. Cetinguel, H. E., Yemez, Y., Erzin, E., & Tekalp, A. M. IEEE TRANSACTIONS ON IMAGE PROCESSING, 15(10):2879-2891, OCT, 2006. doi abstract bibtex There have been several studies that jointly use audio, lip intensity, and lip geometry information for speaker identification and speech-reading applications. This paper proposes using explicit lip motion information, instead of or in addition to lip intensity and/or geometry information, for speaker identification and speech-reading within a unified feature selection and discrimination analysis framework, and addresses two important issues: 1) Is using explicit lip motion information useful, and, 2) if so, what are the best lip motion features for these two applications? The best lip motion features for speaker identification are considered to be those that result in the highest discrimination of individual speakers in a population, whereas for speech-reading, the best features are those providing the highest phoneme/word/phrase recognition rate. Several lip motion feature candidates have been considered including dense motion features within a bounding box about the lip, lip contour motion features, and combination of these with lip shape features. Furthermore, a novel two-stage, spatial, and temporal discrimination analysis is introduced to select the best lip motion features for speaker identification and speech-reading applications. Experimental results using an hidden-Markov-model-based recognition system indicate that using explicit lip motion information provides additional performance gains in both applications, and lip motion features prove more valuable in the case of speech-reading application.
@article{ ISI:000240776200003,
Author = {Cetinguel, H. Ertan and Yemez, Yuecel and Erzin, Engin and Tekalp, A.
Murat},
Title = {{Discriminative analysis of lip motion features for speaker
identification and speech-reading}},
Journal = {{IEEE TRANSACTIONS ON IMAGE PROCESSING}},
Year = {{2006}},
Volume = {{15}},
Number = {{10}},
Pages = {{2879-2891}},
Month = {{OCT}},
Abstract = {{There have been several studies that jointly use audio, lip intensity,
and lip geometry information for speaker identification and
speech-reading applications. This paper proposes using explicit lip
motion information, instead of or in addition to lip intensity and/or
geometry information, for speaker identification and speech-reading
within a unified feature selection and discrimination analysis
framework, and addresses two important issues: 1) Is using explicit lip
motion information useful, and, 2) if so, what are the best lip motion
features for these two applications? The best lip motion features for
speaker identification are considered to be those that result in the
highest discrimination of individual speakers in a population, whereas
for speech-reading, the best features are those providing the highest
phoneme/word/phrase recognition rate. Several lip motion feature
candidates have been considered including dense motion features within a
bounding box about the lip, lip contour motion features, and combination
of these with lip shape features. Furthermore, a novel two-stage,
spatial, and temporal discrimination analysis is introduced to select
the best lip motion features for speaker identification and
speech-reading applications. Experimental results using an
hidden-Markov-model-based recognition system indicate that using
explicit lip motion information provides additional performance gains in
both applications, and lip motion features prove more valuable in the
case of speech-reading application.}},
DOI = {{10.1109/TIP.2006.877528}},
ISSN = {{1057-7149}},
EISSN = {{1941-0042}},
ResearcherID-Numbers = {{Erzin, Engin/H-1716-2011}},
ORCID-Numbers = {{Erzin, Engin/0000-0002-2715-2368}},
Unique-ID = {{ISI:000240776200003}},
}
Downloads: 0
{"_id":"K7wTHvt9FJkZnJ57Y","bibbaseid":"cetinguel-yemez-erzin-tekalp-discriminativeanalysisoflipmotionfeaturesforspeakeridentificationandspeechreading-2006","downloads":0,"creationDate":"2016-12-09T13:41:59.912Z","title":"Discriminative analysis of lip motion features for speaker identification and speech-reading","author_short":["Cetinguel, H. E.","Yemez, Y.","Erzin, E.","Tekalp, A. M."],"year":2006,"bibtype":"article","biburl":"https://drive.google.com/uc?export=download&id=1d5Wvl98W_buJq6prXBz16a_6uF9Cdt-7","bibdata":{"bibtype":"article","type":"article","author":[{"propositions":[],"lastnames":["Cetinguel"],"firstnames":["H.","Ertan"],"suffixes":[]},{"propositions":[],"lastnames":["Yemez"],"firstnames":["Yuecel"],"suffixes":[]},{"propositions":[],"lastnames":["Erzin"],"firstnames":["Engin"],"suffixes":[]},{"propositions":[],"lastnames":["Tekalp"],"firstnames":["A.","Murat"],"suffixes":[]}],"title":"Discriminative analysis of lip motion features for speaker identification and speech-reading","journal":"IEEE TRANSACTIONS ON IMAGE PROCESSING","year":"2006","volume":"15","number":"10","pages":"2879-2891","month":"OCT","abstract":"There have been several studies that jointly use audio, lip intensity, and lip geometry information for speaker identification and speech-reading applications. This paper proposes using explicit lip motion information, instead of or in addition to lip intensity and/or geometry information, for speaker identification and speech-reading within a unified feature selection and discrimination analysis framework, and addresses two important issues: 1) Is using explicit lip motion information useful, and, 2) if so, what are the best lip motion features for these two applications? The best lip motion features for speaker identification are considered to be those that result in the highest discrimination of individual speakers in a population, whereas for speech-reading, the best features are those providing the highest phoneme/word/phrase recognition rate. Several lip motion feature candidates have been considered including dense motion features within a bounding box about the lip, lip contour motion features, and combination of these with lip shape features. Furthermore, a novel two-stage, spatial, and temporal discrimination analysis is introduced to select the best lip motion features for speaker identification and speech-reading applications. Experimental results using an hidden-Markov-model-based recognition system indicate that using explicit lip motion information provides additional performance gains in both applications, and lip motion features prove more valuable in the case of speech-reading application.","doi":"10.1109/TIP.2006.877528","issn":"1057-7149","eissn":"1941-0042","researcherid-numbers":"Erzin, Engin/H-1716-2011","orcid-numbers":"Erzin, Engin/0000-0002-2715-2368","unique-id":"ISI:000240776200003","bibtex":"@article{ ISI:000240776200003,\nAuthor = {Cetinguel, H. Ertan and Yemez, Yuecel and Erzin, Engin and Tekalp, A.\n Murat},\nTitle = {{Discriminative analysis of lip motion features for speaker\n identification and speech-reading}},\nJournal = {{IEEE TRANSACTIONS ON IMAGE PROCESSING}},\nYear = {{2006}},\nVolume = {{15}},\nNumber = {{10}},\nPages = {{2879-2891}},\nMonth = {{OCT}},\nAbstract = {{There have been several studies that jointly use audio, lip intensity,\n and lip geometry information for speaker identification and\n speech-reading applications. This paper proposes using explicit lip\n motion information, instead of or in addition to lip intensity and/or\n geometry information, for speaker identification and speech-reading\n within a unified feature selection and discrimination analysis\n framework, and addresses two important issues: 1) Is using explicit lip\n motion information useful, and, 2) if so, what are the best lip motion\n features for these two applications? The best lip motion features for\n speaker identification are considered to be those that result in the\n highest discrimination of individual speakers in a population, whereas\n for speech-reading, the best features are those providing the highest\n phoneme/word/phrase recognition rate. Several lip motion feature\n candidates have been considered including dense motion features within a\n bounding box about the lip, lip contour motion features, and combination\n of these with lip shape features. Furthermore, a novel two-stage,\n spatial, and temporal discrimination analysis is introduced to select\n the best lip motion features for speaker identification and\n speech-reading applications. Experimental results using an\n hidden-Markov-model-based recognition system indicate that using\n explicit lip motion information provides additional performance gains in\n both applications, and lip motion features prove more valuable in the\n case of speech-reading application.}},\nDOI = {{10.1109/TIP.2006.877528}},\nISSN = {{1057-7149}},\nEISSN = {{1941-0042}},\nResearcherID-Numbers = {{Erzin, Engin/H-1716-2011}},\nORCID-Numbers = {{Erzin, Engin/0000-0002-2715-2368}},\nUnique-ID = {{ISI:000240776200003}},\n}\n\n","author_short":["Cetinguel, H. E.","Yemez, Y.","Erzin, E.","Tekalp, A. M."],"key":"ISI:000240776200003","id":"ISI:000240776200003","bibbaseid":"cetinguel-yemez-erzin-tekalp-discriminativeanalysisoflipmotionfeaturesforspeakeridentificationandspeechreading-2006","role":"author","urls":{},"metadata":{"authorlinks":{}}},"search_terms":["discriminative","analysis","lip","motion","features","speaker","identification","speech","reading","cetinguel","yemez","erzin","tekalp"],"keywords":[],"authorIDs":[],"dataSources":["P7SB4qiBxZPhjXYRW","ziFHh7RJJaJNc9iie"]}