Spontaneous speech events in two speech databases of human-computer and human-human dialogs in Spanish. Rodríguez Fuentes, Luis Javier and Torres, M. Language and Speech, 49(3):333-366, 2006.
Spontaneous speech events in two speech databases of human-computer and human-human dialogs in Spanish [link]Paper  doi  abstract   bibtex   
Previous works in English have revealed that disfluencies follow regular patterns and that incorporating them into the language model of a speech recognizer leads to lower perplexities and sometimes to a better performance. Although work on disfluency modeling has been applied outside the English community (e.g., in Japanese), as far as we know there is no specific work dealing with disfluencies in Spanish. In this paper, we follow a data driven approach in exploring the potential benefit of modeling disfluencies in a speech recognizer in Spanish. Two databases of human-computer and human-human dialogs are considered, which allow the absolute and relative frequencies of disfluencies in the two situations to be compared. The rate of disfluencies in human-human dialogs is found to be very close to that found for similar databases in English. Due to setup factors, the rate of disfluencies found in human-computer dialogs was remarkably higher than that reported for similar databases in English. In any case, from the point of view of speech recognition, the high frequencies of disfluencies and the distinct features of the acoustic events related to them support the need for explicit acoustic models. The regularities observed in the distribution of filled pauses and speech repairs reveal that including them in the language model of the speech recognizer may be also helpful. The extent to which the number of events depends on utterance length and on the speaker is also explored. Statistics are shown that follow previous studies for English, and a sizeable space is devoted to comparing our results with them. Finally, various possible cues for the automatic detection of speech repairs--a key issue from the point of view of speech understanding--are explored: silent pauses, filled pauses, lengthenings, cut off words and discourse markers. As previously observed for English, none of them was found to be reliable by itself. More information, especially at the acoustic-prosodic level, is no doubt needed to reliably detect speech repairs.
@article{rodriguez_fuentes_spontaneous_2006,
	Abstract = {Previous works in English have revealed that disfluencies follow regular patterns and that incorporating them into the language model of a speech recognizer leads to lower perplexities and sometimes to a better performance. Although work on disfluency modeling has been applied outside the English community (e.g., in Japanese), as far as we know there is no specific work dealing with disfluencies in Spanish. In this paper, we follow a data driven approach in exploring the potential benefit of modeling disfluencies in a speech recognizer in Spanish. Two databases of human-computer and human-human dialogs are considered, which allow the absolute and relative frequencies of disfluencies in the two situations to be compared. The rate of disfluencies in human-human dialogs is found to be very close to that found for similar databases in English. Due to setup factors, the rate of disfluencies found in human-computer dialogs was remarkably higher than that reported for similar databases in English. In any case, from the point of view of speech recognition, the high frequencies of disfluencies and the distinct features of the acoustic events related to them support the need for explicit acoustic models. The regularities observed in the distribution of filled pauses and speech repairs reveal that including them in the language model of the speech recognizer may be also helpful. The extent to which the number of events depends on utterance length and on the speaker is also explored. Statistics are shown that follow previous studies for English, and a sizeable space is devoted to comparing our results with them. Finally, various possible cues for the automatic detection of speech repairs--a key issue from the point of view of speech understanding--are explored: silent pauses, filled pauses, lengthenings, cut off words and discourse markers. As previously observed for English, none of them was found to be reliable by itself. More information, especially at the acoustic-prosodic level, is no doubt needed to reliably detect speech repairs.},
	Author = {Rodr{\'i}guez Fuentes, Luis Javier and Torres, Mar{\'i}a In{\'e}s},
	Date-Modified = {2016-09-24 18:56:14 +0000},
	Doi = {10.1177/00238309060490030201},
	Issn = {0023-8309},
	Journal = {Language and Speech},
	Keywords = {corpus, dialogue systems, disfluencies, ESTIVOZ, language resources, phonetics, Spanish, speaking styles, speech corpus, speech technology, spontaneous speech},
	Number = {3},
	Pages = {333-366},
	Title = {Spontaneous speech events in two speech databases of human-computer and human-human dialogs in Spanish},
	Url = {http://las.sagepub.com/cgi/content/abstract/49/3/333 http://las.sagepub.com/cgi/doi/10.1177/00238309060490030201},
	Volume = {49},
	Year = {2006},
	Bdsk-Url-1 = {http://las.sagepub.com/cgi/content/abstract/49/3/333%20http://las.sagepub.com/cgi/doi/10.1177/00238309060490030201},
	Bdsk-Url-2 = {http://dx.doi.org/10.1177/00238309060490030201}}
Downloads: 0