Automatic detection and annotation of disfluencies in spoken French corpora. Christodoulides, G. & Avanzi, M. In Interspeech 2015. Proceedings of the 16th Annual Conference of the Internationational Speech Communication Association, pages 1849–1853, 2015.
Automatic detection and annotation of disfluencies in spoken French corpora [link]Paper  abstract   bibtex   
In this paper we propose a multi-step system for the semi-automatic detection and annotation of disfluencies in spoken corpora. A set of rules, statistical models and machine learning techniques are applied to the input, which is a transcription aligned to the speech signal. The system uses the results of an automatic estimation of prosodic, part-of-speech and shallow syntactic features. We present a detailed coding scheme for simple disfluencies (filled pauses, mispronunciations, false starts, drawls and intra-word pauses), structured disfluencies (repetitions, deletions, substitutions, insertions) and complex disfluencies. The system is trained and evaluated on a transcribed corpus of spontaneous French speech, consisting of 112 different speakers and balanced for speaker age and sex, covering 14 different varieties of French spoken in Belgium, France and Switzerland.
@inproceedings{christodoulides_automatic_2015,
	Author = {Christodoulides, George and Avanzi, Mathieu},
	Booktitle = {Interspeech 2015. Proceedings of the 16th Annual Conference of the Internationational Speech Communication Association},
	Date = {2015},
	Date-Modified = {2018-07-20 18:49:25 +0000},
	Eventdate = {2015-09-06/2015-09-10},
	Keywords = {labelling and annotation, language resources, disfluencies, French, phonetics, speaking styles, speech technology, spontaneous speech, speech corpora},
	Location = {Dresden, Germany},
	Pages = {1849--1853},
	Title = {Automatic detection and annotation of disfluencies in spoken French corpora},
	Url = {http://www.isca-speech.org/archive/interspeech_2015/i15_1849.html},
	Year = {2015},
	Abstract = {In this paper we propose a multi-step system for the semi-automatic detection and annotation of disfluencies in spoken corpora. A set of rules, statistical models and machine learning techniques are applied to the input, which is a transcription aligned to the speech signal. The system uses the results of an automatic estimation of prosodic, part-of-speech and shallow syntactic features. We present a detailed coding scheme for simple disfluencies (filled pauses, mispronunciations, false starts, drawls and intra-word pauses), structured disfluencies (repetitions, deletions, substitutions, insertions) and complex disfluencies. The system is trained and evaluated on a transcribed corpus of spontaneous French speech, consisting of 112 different speakers and balanced for speaker age and sex, covering 14 different varieties of French spoken in Belgium, France and Switzerland.},
	Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGJCVYJHZlcnNpb25YJG9iamVjdHNZJGFyY2hpdmVyVCR0b3ASAAGGoKgHCBMUFRYaIVUkbnVsbNMJCgsMDxJXTlMua2V5c1pOUy5vYmplY3RzViRjbGFzc6INDoACgAOiEBGABIAFgAdccmVsYXRpdmVQYXRoWWFsaWFzRGF0YV8QbS4uLy4uLy4uL0JpYmxpb2dyYWZpYS9QYXBlcnMvQ2hyaXN0b2RvdWxpZGVzL0F1dG9tYXRpYyBkZXRlY3Rpb24gYW5kIGFubm90YXRpb24gb2YgZGlzZmx1ZW5jaWVzIGluIHNwb2tlbi5wZGbSFwsYGVdOUy5kYXRhTxECgAAAAAACgAACAAAMTWFjaW50b3NoIEhEAAAAAAAAAAAAAAAAAAAAy/YfzkgrAAAQhmkyH0F1dG9tYXRpYyBkZXRlY3RpbyMxMDg2NjkzMy5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABCGaTPUCdMZAAAAAAAAAAAAAwAEAAAJIAAAAAAAAAAAAAAAAAAAAA9DaHJpc3RvZG91bGlkZXMAABAACAAAy/YDrgAAABEACAAA1Am2+QAAAAEAFBCGaTIQhmWOAAX8RwAF+5gAAMBGAAIAbU1hY2ludG9zaCBIRDpVc2VyczoAam9hcXVpbV9sbGlzdGVycmk6AEJpYmxpb2dyYWZpYToAUGFwZXJzOgBDaHJpc3RvZG91bGlkZXM6AEF1dG9tYXRpYyBkZXRlY3RpbyMxMDg2NjkzMy5wZGYAAA4AggBAAEEAdQB0AG8AbQBhAHQAaQBjACAAZABlAHQAZQBjAHQAaQBvAG4AIABhAG4AZAAgAGEAbgBuAG8AdABhAHQAaQBvAG4AIABvAGYAIABkAGkAcwBmAGwAdQBlAG4AYwBpAGUAcwAgAGkAbgAgAHMAcABvAGsAZQBuAC4AcABkAGYADwAaAAwATQBhAGMAaQBuAHQAbwBzAGgAIABIAEQAEgB8VXNlcnMvam9hcXVpbV9sbGlzdGVycmkvQmlibGlvZ3JhZmlhL1BhcGVycy9DaHJpc3RvZG91bGlkZXMvQXV0b21hdGljIGRldGVjdGlvbiBhbmQgYW5ub3RhdGlvbiBvZiBkaXNmbHVlbmNpZXMgaW4gc3Bva2VuLnBkZgATAAEvAAAVAAIAGP//AACABtIbHB0eWiRjbGFzc25hbWVYJGNsYXNzZXNdTlNNdXRhYmxlRGF0YaMdHyBWTlNEYXRhWE5TT2JqZWN00hscIiNcTlNEaWN0aW9uYXJ5oiIgXxAPTlNLZXllZEFyY2hpdmVy0SYnVHJvb3SAAQAIABEAGgAjAC0AMgA3AEAARgBNAFUAYABnAGoAbABuAHEAcwB1AHcAhACOAP4BAwELA48DkQOWA6EDqgO4A7wDwwPMA9ED3gPhA/MD9gP7AAAAAAAAAgEAAAAAAAAAKAAAAAAAAAAAAAAAAAAAA/0=},
	Bdsk-Url-1 = {http://www.isca-speech.org/archive/interspeech_2015/i15_1849.html}}

Downloads: 0