Automatic detection and annotation of disfluencies in spoken French corpora

Automatic detection and annotation of disfluencies in spoken French corpora. Christodoulides, G. & Avanzi, M. In Interspeech 2015. Proceedings of the 16th Annual Conference of the Internationational Speech Communication Association, pages 1849–1853, 2015.

Paper abstract bibtex

In this paper we propose a multi-step system for the semi-automatic detection and annotation of disfluencies in spoken corpora. A set of rules, statistical models and machine learning techniques are applied to the input, which is a transcription aligned to the speech signal. The system uses the results of an automatic estimation of prosodic, part-of-speech and shallow syntactic features. We present a detailed coding scheme for simple disfluencies (filled pauses, mispronunciations, false starts, drawls and intra-word pauses), structured disfluencies (repetitions, deletions, substitutions, insertions) and complex disfluencies. The system is trained and evaluated on a transcribed corpus of spontaneous French speech, consisting of 112 different speakers and balanced for speaker age and sex, covering 14 different varieties of French spoken in Belgium, France and Switzerland.

@inproceedings{christodoulides_automatic_2015,
	Author = {Christodoulides, George and Avanzi, Mathieu},
	Booktitle = {Interspeech 2015. Proceedings of the 16th Annual Conference of the Internationational Speech Communication Association},
	Date = {2015},
	Date-Modified = {2018-07-20 18:49:25 +0000},
	Eventdate = {2015-09-06/2015-09-10},
	Keywords = {labelling and annotation, language resources, disfluencies, French, phonetics, speaking styles, speech technology, spontaneous speech, speech corpora},
	Location = {Dresden, Germany},
	Pages = {1849--1853},
	Title = {Automatic detection and annotation of disfluencies in spoken French corpora},
	Url = {http://www.isca-speech.org/archive/interspeech_2015/i15_1849.html},
	Year = {2015},
	Abstract = {In this paper we propose a multi-step system for the semi-automatic detection and annotation of disfluencies in spoken corpora. A set of rules, statistical models and machine learning techniques are applied to the input, which is a transcription aligned to the speech signal. The system uses the results of an automatic estimation of prosodic, part-of-speech and shallow syntactic features. We present a detailed coding scheme for simple disfluencies (filled pauses, mispronunciations, false starts, drawls and intra-word pauses), structured disfluencies (repetitions, deletions, substitutions, insertions) and complex disfluencies. The system is trained and evaluated on a transcribed corpus of spontaneous French speech, consisting of 112 different speakers and balanced for speaker age and sex, covering 14 different varieties of French spoken in Belgium, France and Switzerland.},
	Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGJCVYJHZlcnNpb25YJG9iamVjdHNZJGFyY2hpdmVyVCR0b3ASAAGGoKgHCBMUFRYaIVUkbnVsbNMJCgsMDxJXTlMua2V5c1pOUy5vYmplY3RzViRjbGFzc6INDoACgAOiEBGABIAFgAdccmVsYXRpdmVQYXRoWWFsaWFzRGF0YV8QbS4uLy4uLy4uL0JpYmxpb2dyYWZpYS9QYXBlcnMvQ2hyaXN0b2RvdWxpZGVzL0F1dG9tYXRpYyBkZXRlY3Rpb24gYW5kIGFubm90YXRpb24gb2YgZGlzZmx1ZW5jaWVzIGluIHNwb2tlbi5wZGbSFwsYGVdOUy5kYXRhTxECgAAAAAACgAACAAAMTWFjaW50b3NoIEhEAAAAAAAAAAAAAAAAAAAAy/YfzkgrAAAQhmkyH0F1dG9tYXRpYyBkZXRlY3RpbyMxMDg2NjkzMy5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABCGaTPUCdMZAAAAAAAAAAAAAwAEAAAJIAAAAAAAAAAAAAAAAAAAAA9DaHJpc3RvZG91bGlkZXMAABAACAAAy/YDrgAAABEACAAA1Am2+QAAAAEAFBCGaTIQhmWOAAX8RwAF+5gAAMBGAAIAbU1hY2ludG9zaCBIRDpVc2VyczoAam9hcXVpbV9sbGlzdGVycmk6AEJpYmxpb2dyYWZpYToAUGFwZXJzOgBDaHJpc3RvZG91bGlkZXM6AEF1dG9tYXRpYyBkZXRlY3RpbyMxMDg2NjkzMy5wZGYAAA4AggBAAEEAdQB0AG8AbQBhAHQAaQBjACAAZABlAHQAZQBjAHQAaQBvAG4AIABhAG4AZAAgAGEAbgBuAG8AdABhAHQAaQBvAG4AIABvAGYAIABkAGkAcwBmAGwAdQBlAG4AYwBpAGUAcwAgAGkAbgAgAHMAcABvAGsAZQBuAC4AcABkAGYADwAaAAwATQBhAGMAaQBuAHQAbwBzAGgAIABIAEQAEgB8VXNlcnMvam9hcXVpbV9sbGlzdGVycmkvQmlibGlvZ3JhZmlhL1BhcGVycy9DaHJpc3RvZG91bGlkZXMvQXV0b21hdGljIGRldGVjdGlvbiBhbmQgYW5ub3RhdGlvbiBvZiBkaXNmbHVlbmNpZXMgaW4gc3Bva2VuLnBkZgATAAEvAAAVAAIAGP//AACABtIbHB0eWiRjbGFzc25hbWVYJGNsYXNzZXNdTlNNdXRhYmxlRGF0YaMdHyBWTlNEYXRhWE5TT2JqZWN00hscIiNcTlNEaWN0aW9uYXJ5oiIgXxAPTlNLZXllZEFyY2hpdmVy0SYnVHJvb3SAAQAIABEAGgAjAC0AMgA3AEAARgBNAFUAYABnAGoAbABuAHEAcwB1AHcAhACOAP4BAwELA48DkQOWA6EDqgO4A7wDwwPMA9ED3gPhA/MD9gP7AAAAAAAAAgEAAAAAAAAAKAAAAAAAAAAAAAAAAAAAA/0=},
	Bdsk-Url-1 = {http://www.isca-speech.org/archive/interspeech_2015/i15_1849.html}}

Downloads: 0

{"_id":"J4qrsmuhqqY4vcSk9","bibbaseid":"christodoulides-avanzi-automaticdetectionandannotationofdisfluenciesinspokenfrenchcorpora-2015","downloads":0,"creationDate":"2016-09-21T09:08:37.973Z","title":"Automatic detection and annotation of disfluencies in spoken French corpora","author_short":["Christodoulides, G.","Avanzi, M."],"year":2015,"bibtype":"inproceedings","biburl":"https://joaquimllisterri.cat/phonetics/ESTIVOZ/ESTIVOZ.bib","bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"propositions":[],"lastnames":["Christodoulides"],"firstnames":["George"],"suffixes":[]},{"propositions":[],"lastnames":["Avanzi"],"firstnames":["Mathieu"],"suffixes":[]}],"booktitle":"Interspeech 2015. Proceedings of the 16th Annual Conference of the Internationational Speech Communication Association","date":"2015","date-modified":"2018-07-20 18:49:25 +0000","eventdate":"2015-09-06/2015-09-10","keywords":"labelling and annotation, language resources, disfluencies, French, phonetics, speaking styles, speech technology, spontaneous speech, speech corpora","location":"Dresden, Germany","pages":"1849–1853","title":"Automatic detection and annotation of disfluencies in spoken French corpora","url":"http://www.isca-speech.org/archive/interspeech_2015/i15_1849.html","year":"2015","abstract":"In this paper we propose a multi-step system for the semi-automatic detection and annotation of disfluencies in spoken corpora. A set of rules, statistical models and machine learning techniques are applied to the input, which is a transcription aligned to the speech signal. The system uses the results of an automatic estimation of prosodic, part-of-speech and shallow syntactic features. We present a detailed coding scheme for simple disfluencies (filled pauses, mispronunciations, false starts, drawls and intra-word pauses), structured disfluencies (repetitions, deletions, substitutions, insertions) and complex disfluencies. The system is trained and evaluated on a transcribed corpus of spontaneous French speech, consisting of 112 different speakers and balanced for speaker age and sex, covering 14 different varieties of French spoken in Belgium, France and Switzerland.","bdsk-file-1":"YnBsaXN0MDDUAQIDBAUGJCVYJHZlcnNpb25YJG9iamVjdHNZJGFyY2hpdmVyVCR0b3ASAAGGoKgHCBMUFRYaIVUkbnVsbNMJCgsMDxJXTlMua2V5c1pOUy5vYmplY3RzViRjbGFzc6INDoACgAOiEBGABIAFgAdccmVsYXRpdmVQYXRoWWFsaWFzRGF0YV8QbS4uLy4uLy4uL0JpYmxpb2dyYWZpYS9QYXBlcnMvQ2hyaXN0b2RvdWxpZGVzL0F1dG9tYXRpYyBkZXRlY3Rpb24gYW5kIGFubm90YXRpb24gb2YgZGlzZmx1ZW5jaWVzIGluIHNwb2tlbi5wZGbSFwsYGVdOUy5kYXRhTxECgAAAAAACgAACAAAMTWFjaW50b3NoIEhEAAAAAAAAAAAAAAAAAAAAy/YfzkgrAAAQhmkyH0F1dG9tYXRpYyBkZXRlY3RpbyMxMDg2NjkzMy5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABCGaTPUCdMZAAAAAAAAAAAAAwAEAAAJIAAAAAAAAAAAAAAAAAAAAA9DaHJpc3RvZG91bGlkZXMAABAACAAAy/YDrgAAABEACAAA1Am2+QAAAAEAFBCGaTIQhmWOAAX8RwAF+5gAAMBGAAIAbU1hY2ludG9zaCBIRDpVc2VyczoAam9hcXVpbV9sbGlzdGVycmk6AEJpYmxpb2dyYWZpYToAUGFwZXJzOgBDaHJpc3RvZG91bGlkZXM6AEF1dG9tYXRpYyBkZXRlY3RpbyMxMDg2NjkzMy5wZGYAAA4AggBAAEEAdQB0AG8AbQBhAHQAaQBjACAAZABlAHQAZQBjAHQAaQBvAG4AIABhAG4AZAAgAGEAbgBuAG8AdABhAHQAaQBvAG4AIABvAGYAIABkAGkAcwBmAGwAdQBlAG4AYwBpAGUAcwAgAGkAbgAgAHMAcABvAGsAZQBuAC4AcABkAGYADwAaAAwATQBhAGMAaQBuAHQAbwBzAGgAIABIAEQAEgB8VXNlcnMvam9hcXVpbV9sbGlzdGVycmkvQmlibGlvZ3JhZmlhL1BhcGVycy9DaHJpc3RvZG91bGlkZXMvQXV0b21hdGljIGRldGVjdGlvbiBhbmQgYW5ub3RhdGlvbiBvZiBkaXNmbHVlbmNpZXMgaW4gc3Bva2VuLnBkZgATAAEvAAAVAAIAGP//AACABtIbHB0eWiRjbGFzc25hbWVYJGNsYXNzZXNdTlNNdXRhYmxlRGF0YaMdHyBWTlNEYXRhWE5TT2JqZWN00hscIiNcTlNEaWN0aW9uYXJ5oiIgXxAPTlNLZXllZEFyY2hpdmVy0SYnVHJvb3SAAQAIABEAGgAjAC0AMgA3AEAARgBNAFUAYABnAGoAbABuAHEAcwB1AHcAhACOAP4BAwELA48DkQOWA6EDqgO4A7wDwwPMA9ED3gPhA/MD9gP7AAAAAAAAAgEAAAAAAAAAKAAAAAAAAAAAAAAAAAAAA/0=","bdsk-url-1":"http://www.isca-speech.org/archive/interspeech_2015/i15_1849.html","bibtex":"@inproceedings{christodoulides_automatic_2015,\n\tAuthor = {Christodoulides, George and Avanzi, Mathieu},\n\tBooktitle = {Interspeech 2015. Proceedings of the 16th Annual Conference of the Internationational Speech Communication Association},\n\tDate = {2015},\n\tDate-Modified = {2018-07-20 18:49:25 +0000},\n\tEventdate = {2015-09-06/2015-09-10},\n\tKeywords = {labelling and annotation, language resources, disfluencies, French, phonetics, speaking styles, speech technology, spontaneous speech, speech corpora},\n\tLocation = {Dresden, Germany},\n\tPages = {1849--1853},\n\tTitle = {Automatic detection and annotation of disfluencies in spoken French corpora},\n\tUrl = {http://www.isca-speech.org/archive/interspeech_2015/i15_1849.html},\n\tYear = {2015},\n\tAbstract = {In this paper we propose a multi-step system for the semi-automatic detection and annotation of disfluencies in spoken corpora. A set of rules, statistical models and machine learning techniques are applied to the input, which is a transcription aligned to the speech signal. The system uses the results of an automatic estimation of prosodic, part-of-speech and shallow syntactic features. We present a detailed coding scheme for simple disfluencies (filled pauses, mispronunciations, false starts, drawls and intra-word pauses), structured disfluencies (repetitions, deletions, substitutions, insertions) and complex disfluencies. The system is trained and evaluated on a transcribed corpus of spontaneous French speech, consisting of 112 different speakers and balanced for speaker age and sex, covering 14 different varieties of French spoken in Belgium, France and Switzerland.},\n\tBdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGJCVYJHZlcnNpb25YJG9iamVjdHNZJGFyY2hpdmVyVCR0b3ASAAGGoKgHCBMUFRYaIVUkbnVsbNMJCgsMDxJXTlMua2V5c1pOUy5vYmplY3RzViRjbGFzc6INDoACgAOiEBGABIAFgAdccmVsYXRpdmVQYXRoWWFsaWFzRGF0YV8QbS4uLy4uLy4uL0JpYmxpb2dyYWZpYS9QYXBlcnMvQ2hyaXN0b2RvdWxpZGVzL0F1dG9tYXRpYyBkZXRlY3Rpb24gYW5kIGFubm90YXRpb24gb2YgZGlzZmx1ZW5jaWVzIGluIHNwb2tlbi5wZGbSFwsYGVdOUy5kYXRhTxECgAAAAAACgAACAAAMTWFjaW50b3NoIEhEAAAAAAAAAAAAAAAAAAAAy/YfzkgrAAAQhmkyH0F1dG9tYXRpYyBkZXRlY3RpbyMxMDg2NjkzMy5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABCGaTPUCdMZAAAAAAAAAAAAAwAEAAAJIAAAAAAAAAAAAAAAAAAAAA9DaHJpc3RvZG91bGlkZXMAABAACAAAy/YDrgAAABEACAAA1Am2+QAAAAEAFBCGaTIQhmWOAAX8RwAF+5gAAMBGAAIAbU1hY2ludG9zaCBIRDpVc2VyczoAam9hcXVpbV9sbGlzdGVycmk6AEJpYmxpb2dyYWZpYToAUGFwZXJzOgBDaHJpc3RvZG91bGlkZXM6AEF1dG9tYXRpYyBkZXRlY3RpbyMxMDg2NjkzMy5wZGYAAA4AggBAAEEAdQB0AG8AbQBhAHQAaQBjACAAZABlAHQAZQBjAHQAaQBvAG4AIABhAG4AZAAgAGEAbgBuAG8AdABhAHQAaQBvAG4AIABvAGYAIABkAGkAcwBmAGwAdQBlAG4AYwBpAGUAcwAgAGkAbgAgAHMAcABvAGsAZQBuAC4AcABkAGYADwAaAAwATQBhAGMAaQBuAHQAbwBzAGgAIABIAEQAEgB8VXNlcnMvam9hcXVpbV9sbGlzdGVycmkvQmlibGlvZ3JhZmlhL1BhcGVycy9DaHJpc3RvZG91bGlkZXMvQXV0b21hdGljIGRldGVjdGlvbiBhbmQgYW5ub3RhdGlvbiBvZiBkaXNmbHVlbmNpZXMgaW4gc3Bva2VuLnBkZgATAAEvAAAVAAIAGP//AACABtIbHB0eWiRjbGFzc25hbWVYJGNsYXNzZXNdTlNNdXRhYmxlRGF0YaMdHyBWTlNEYXRhWE5TT2JqZWN00hscIiNcTlNEaWN0aW9uYXJ5oiIgXxAPTlNLZXllZEFyY2hpdmVy0SYnVHJvb3SAAQAIABEAGgAjAC0AMgA3AEAARgBNAFUAYABnAGoAbABuAHEAcwB1AHcAhACOAP4BAwELA48DkQOWA6EDqgO4A7wDwwPMA9ED3gPhA/MD9gP7AAAAAAAAAgEAAAAAAAAAKAAAAAAAAAAAAAAAAAAAA/0=},\n\tBdsk-Url-1 = {http://www.isca-speech.org/archive/interspeech_2015/i15_1849.html}}\n\n","author_short":["Christodoulides, G.","Avanzi, M."],"key":"christodoulides_automatic_2015","id":"christodoulides_automatic_2015","bibbaseid":"christodoulides-avanzi-automaticdetectionandannotationofdisfluenciesinspokenfrenchcorpora-2015","role":"author","urls":{"Paper":"http://www.isca-speech.org/archive/interspeech_2015/i15_1849.html"},"keyword":["labelling and annotation","language resources","disfluencies","French","phonetics","speaking styles","speech technology","spontaneous speech","speech corpora"],"metadata":{"authorlinks":{}},"downloads":0,"html":""},"search_terms":["automatic","detection","annotation","disfluencies","spoken","french","corpora","christodoulides","avanzi"],"keywords":["labelling and annotation","language resources","disfluencies","french","phonetics","speaking styles","speech technology","spontaneous speech","speech corpora"],"authorIDs":[],"dataSources":["qBn3jEfYwFvzHJsYh","BrMmNtBqG9aDvpsZn"]}