Generating stressed speech from neutral speech using a modified CELP vocoder. Bou-Ghazale, S E and Hansen, J. H L Speech Communication, 20(1-2):93-110.
doi  abstract   bibtex   
The problem of speech modeling for generating stressed speech using a source generator framework is addressed in this paper. In general, stress in this context refers to emotional or task induced speaking conditions. Throughout this particular study, the focus will be limited to speech under angry, loud and Lombard effect (i.e., speech produced in noise) speaking conditions. Source generator theory was originally developed for equalization of speech under stress for robust recognition (Hansen, 1993, 1994). It was later used for simulated stressed training token generation for improved recognition (Bou-Ghazale, 1993; Bou-Ghazale and Hansen, 1994). The objective here is to generate stressed perturbed speech from neutral speech using a source generator framework previously employed for stressed speech recognition. The approach is based on (i) developing a mathematical model that provides a means for representing the change in speech production under stressed conditions for perturbation, and (ii) employing this framework in an isolated word speech processing system to produce emotional/stressed perturbed speech from neutral speech. A stress perturbation algorithm is formulated based on a CELP (code-excited linear prediction) speech synthesis structure. The algorithm is evaluated using four different speech feature perturbation sets. The stressed speech parameter evaluations from this study revealed that pitch is capable of reflecting the emotional state of the speaker, while formant information alone is not as good a correlate of stress. However, the combination of formant location, pitch and gain information proved to be the most reliable indicator of emotional stress under a CELP speech model. Results from formal listener evaluations of the generated stressed speech show successful classification rates of 87% for angry speech, 75% for Lombard effect speech and 92% for loud speech.
@article{bou-ghazale_generating_1996,
	Author = {Bou-Ghazale, S E and Hansen, John H L},
	Date = {1996},
	Date-Modified = {2017-04-19 08:04:06 +0000},
	Doi = {10.1016/S0167-6393(96)00047-7},
	Journal = {Speech Communication},
	Keywords = {emotions, speaking styles, speech synthesis, speech technology, stress},
	Number = {1-2},
	Pages = {93-110},
	Title = {Generating stressed speech from neutral speech using a modified CELP vocoder},
	Volume = {20},
	Abstract = {The problem of speech modeling for generating stressed speech using a source generator framework is addressed in this paper. In general, stress in this context refers to emotional or task induced speaking conditions. Throughout this particular study, the focus will be limited to speech under angry, loud and Lombard effect (i.e., speech produced in noise) speaking conditions. Source generator theory was originally developed for equalization of speech under stress for robust recognition (Hansen, 1993, 1994). It was later used for simulated stressed training token generation for improved recognition (Bou-Ghazale, 1993; Bou-Ghazale and Hansen, 1994). The objective here is to generate stressed perturbed speech from neutral speech using a source generator framework previously employed for stressed speech recognition. The approach is based on (i) developing a mathematical model that provides a means for representing the change in speech production under stressed conditions for perturbation, and (ii) employing this framework in an isolated word speech processing system to produce emotional/stressed perturbed speech from neutral speech. A stress perturbation algorithm is formulated based on a CELP (code-excited linear prediction) speech synthesis structure. The algorithm is evaluated using four different speech feature perturbation sets. The stressed speech parameter evaluations from this study revealed that pitch is capable of reflecting the emotional state of the speaker, while formant information alone is not as good a correlate of stress. However, the combination of formant location, pitch and gain information proved to be the most reliable indicator of emotional stress under a CELP speech model. Results from formal listener evaluations of the generated stressed speech show successful classification rates of 87\% for angry speech, 75\% for Lombard effect speech and 92\% for loud speech.},
	Bdsk-Url-1 = {http://dx.doi.org/10.1016/S0167-6393(96)00047-7}}
Downloads: 0