Automatic metadata extraction from spoken content using speech and speaker recognition techniques. Delgado, H.; Serrano, J.; and Carrabina, J. In FALA 2010. VI Jornadas en Tecnología del Habla - II Iberian SLTech Workshop, pages 201-204. Centro Social Caixanova, Vigo, Spain. 10-12 November, 2010.
Automatic metadata extraction from spoken content using speech and speaker recognition techniques [pdf]Paper  abstract   bibtex   
Today information extraction plays a significant role in management of massive data quantities for different purposes. One of the open challenges in this field is the automatic extraction of information from audio streams. This paper describes a useful metadata extraction system which performs a powerful combination of speech and speaker recognition tasks. The system carries out the speech transcription through a Catalan language recognizer based on Hidden Markov (HMM) tied-state crossword triphones acoustic models, Mel Frequency Cepstral Coding (MFCC) and N-gram language modeling. In addition, a speaker diarization is performed using HMM based segmentation and Perceptual Linear Prediction (PLP) feature extraction. Both speech-to-text transcription and speaker diarization can be utilized as annotation data for multimedia content. In order to make indexing and retrieval more flexible and efficient, the extracted metadata is stored using the MPEG-7 multimedia content description interface. The system has been successfully tested on the recordings of the plenary sessions of the Catalan Parliament.
@incollection{delgado_automatic_2010,
	Author = {Delgado, Héctor and Serrano, Javier and Carrabina, Jordi},
	Booktitle = {FALA 2010. VI Jornadas en Tecnología del Habla - II Iberian SLTech Workshop},
	Date = {2010},
	Date-Modified = {2016-09-24 18:56:01 +0000},
	File = {Attachment:files/2940/Delgado, Serrano, Carrabina - 2010 - Automatic metadata extraction from spoken content using speech and speaker recognition techniques.pdf:application/pdf},
	Keywords = {applications, speaker recognition, speech technology},
	Pages = {201-204},
	Publisher = {Centro Social Caixanova, Vigo, Spain. 10-12 November, 2010},
	Title = {Automatic metadata extraction from spoken content using speech and speaker recognition techniques},
	Url = {http://fala2010.uvigo.es/images/proceedings/pdfs/0043.pdf},
	Abstract = {Today information extraction plays a significant role in management of massive data quantities for different purposes. One of the open challenges in this field is the automatic extraction of information from audio streams. This paper describes a useful metadata extraction system which performs a powerful combination of speech and speaker recognition tasks. The system carries out the speech transcription through a Catalan language recognizer based on Hidden Markov (HMM) tied-state crossword triphones acoustic models, Mel Frequency Cepstral Coding (MFCC) and N-gram language modeling. In addition, a speaker diarization is performed using HMM based segmentation and Perceptual Linear Prediction (PLP) feature extraction. Both speech-to-text transcription and speaker diarization can be utilized as annotation data for multimedia content. In order to make indexing and retrieval more flexible and efficient, the extracted metadata is stored using the MPEG-7 multimedia content description interface. The system has been successfully tested on the recordings of the plenary sessions of the Catalan Parliament.},
	Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGJCVYJHZlcnNpb25YJG9iamVjdHNZJGFyY2hpdmVyVCR0b3ASAAGGoKgHCBMUFRYaIVUkbnVsbNMJCgsMDxJXTlMua2V5c1pOUy5vYmplY3RzViRjbGFzc6INDoACgAOiEBGABIAFgAdccmVsYXRpdmVQYXRoWWFsaWFzRGF0YV8QUi4uLy4uLy4uL0JpYmxpb2dyYWZpYS9QYXBlcnMvRGVsZ2Fkby9BdXRvbWF0aWMgbWV0YWRhdGEgZXh0cmFjdGlvbiBmcm9tIHNwb2tlbi5wZGbSFwsYGVdOUy5kYXRhTxECMAAAAAACMAACAAAMTWFjaW50b3NoIEhEAAAAAAAAAAAAAAAAAAAAy/YfzkgrAAAQhmoeH0F1dG9tYXRpYyBtZXRhZGF0YSMxMDg2NkExRi5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABCGah/UCdMuAAAAAAAAAAAAAwAEAAAJIAAAAAAAAAAAAAAAAAAAAAdEZWxnYWRvAAAQAAgAAMv2A64AAAARAAgAANQJtw4AAAABABQQhmoeEIZljgAF/EcABfuYAADARgACAGVNYWNpbnRvc2ggSEQ6VXNlcnM6AGpvYXF1aW1fbGxpc3RlcnJpOgBCaWJsaW9ncmFmaWE6AFBhcGVyczoARGVsZ2FkbzoAQXV0b21hdGljIG1ldGFkYXRhIzEwODY2QTFGLnBkZgAADgBcAC0AQQB1AHQAbwBtAGEAdABpAGMAIABtAGUAdABhAGQAYQB0AGEAIABlAHgAdAByAGEAYwB0AGkAbwBuACAAZgByAG8AbQAgAHMAcABvAGsAZQBuAC4AcABkAGYADwAaAAwATQBhAGMAaQBuAHQAbwBzAGgAIABIAEQAEgBhVXNlcnMvam9hcXVpbV9sbGlzdGVycmkvQmlibGlvZ3JhZmlhL1BhcGVycy9EZWxnYWRvL0F1dG9tYXRpYyBtZXRhZGF0YSBleHRyYWN0aW9uIGZyb20gc3Bva2VuLnBkZgAAEwABLwAAFQACABj//wAAgAbSGxwdHlokY2xhc3NuYW1lWCRjbGFzc2VzXU5TTXV0YWJsZURhdGGjHR8gVk5TRGF0YVhOU09iamVjdNIbHCIjXE5TRGljdGlvbmFyeaIiIF8QD05TS2V5ZWRBcmNoaXZlctEmJ1Ryb290gAEACAARABoAIwAtADIANwBAAEYATQBVAGAAZwBqAGwAbgBxAHMAdQB3AIQAjgDjAOgA8AMkAyYDKwM2Az8DTQNRA1gDYQNmA3MDdgOIA4sDkAAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAAOS},
	Bdsk-Url-1 = {http://fala2010.uvigo.es/images/proceedings/pdfs/0043.pdf}}
Downloads: 0