Expressive speech synthesis using a concatenative synthesizer. Bulut, M.; Narayanan, S. S; and Syrdal, A. K In ICSLP 2002 - Interspeech 2002. Proceedings of the 7th International Conference on Spoken Language Processing, pages 1265-1268. Denver, Colorado, USA, September 16-20, 2002.
Expressive speech synthesis using a concatenative synthesizer [pdf]Paper  abstract   bibtex   
This paper describes an experiment in synthesizing four emotional states - anger, happiness, sadness and neutral - using a concatenative speech synthesizer. To achieve this, five emotionally (i.e., semantically) unbiased target sentences were prepared. Then, separate speech inventories, comprising the target diphones for each of the above emotions, were recorded. Using the 16 different combinations of prosody and inventory during synthesis resulted in 80 synthetic test sentences. The results were evaluated by conducting listening tests with 33 naïve listeners. Synthesized anger was recognized with 86.1% accuracy, sadness with 89.1%, happiness with 44.2%, and neutral emotion with 81.8% accuracy. According to our results, anger was classified as inventory dominant and sadness and neutral as prosody dominant. Results were not sufficient to make similar conclusions regarding happiness. The highest recognition accuracies were achieved for sentences synthesized by using prosody and diphone inventory belonging to the same emotion.
@incollection{bulut_expressive_2002,
	Author = {Bulut, Murtaza and Narayanan, Shrikanth S and Syrdal, Ann K},
	Booktitle = {ICSLP 2002 - Interspeech 2002. Proceedings of the 7th International Conference on Spoken Language Processing},
	Date = {2002},
	Date-Modified = {2016-09-24 18:56:00 +0000},
	File = {Attachment:files/1671/Bulut, Narayanan, Syrdal - 2002 - Expressive speech synthesis using a concatenative synthesizer.pdf:application/pdf},
	Keywords = {emotions, speaking styles, speech synthesis, speech technology},
	Pages = {1265-1268},
	Publisher = {Denver, Colorado, USA, September 16-20, 2002},
	Title = {Expressive speech synthesis using a concatenative synthesizer},
	Url = {http://www2.research.att.com/~ttsweb/tts/papers/2002_ICSLP/expressive.pdf},
	Abstract = {This paper describes an experiment in synthesizing four emotional states - anger, happiness, sadness and neutral - using a concatenative speech synthesizer. To achieve this, five emotionally (i.e., semantically) unbiased target sentences were prepared. Then, separate speech inventories, comprising the target diphones for each of the above emotions, were recorded. Using the 16 different combinations of prosody and inventory during synthesis resulted in 80 synthetic test sentences. The results were evaluated by conducting listening tests with 33 naïve listeners. Synthesized anger was recognized with 86.1\% accuracy, sadness with 89.1\%, happiness with 44.2\%, and neutral emotion with 81.8\% accuracy. According to our results, anger was classified as inventory dominant and sadness and neutral as prosody dominant. Results were not sufficient to make similar conclusions regarding happiness. The highest recognition accuracies were achieved for sentences synthesized by using prosody and diphone inventory belonging to the same emotion.},
	Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGJCVYJHZlcnNpb25YJG9iamVjdHNZJGFyY2hpdmVyVCR0b3ASAAGGoKgHCBMUFRYaIVUkbnVsbNMJCgsMDxJXTlMua2V5c1pOUy5vYmplY3RzViRjbGFzc6INDoACgAOiEBGABIAFgAdccmVsYXRpdmVQYXRoWWFsaWFzRGF0YV8QWC4uLy4uLy4uL0JpYmxpb2dyYWZpYS9QYXBlcnMvQnVsdXQvRXhwcmVzc2l2ZSBzcGVlY2ggc3ludGhlc2lzIHVzaW5nIGEgY29uY2F0ZW5hdGl2ZS5wZGbSFwsYGVdOUy5kYXRhTxECQgAAAAACQgACAAAMTWFjaW50b3NoIEhEAAAAAAAAAAAAAAAAAAAAy/YfzkgrAAAQhmguH0V4cHJlc3NpdmUgc3BlZWNoICMxMDg2NjgyRi5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABCGaC/UCdMSAAAAAAAAAAAAAwAEAAAJIAAAAAAAAAAAAAAAAAAAAAVCdWx1dAAAEAAIAADL9gOuAAAAEQAIAADUCbbyAAAAAQAUEIZoLhCGZY4ABfxHAAX7mAAAwEYAAgBjTWFjaW50b3NoIEhEOlVzZXJzOgBqb2FxdWltX2xsaXN0ZXJyaToAQmlibGlvZ3JhZmlhOgBQYXBlcnM6AEJ1bHV0OgBFeHByZXNzaXZlIHNwZWVjaCAjMTA4NjY4MkYucGRmAAAOAGwANQBFAHgAcAByAGUAcwBzAGkAdgBlACAAcwBwAGUAZQBjAGgAIABzAHkAbgB0AGgAZQBzAGkAcwAgAHUAcwBpAG4AZwAgAGEAIABjAG8AbgBjAGEAdABlAG4AYQB0AGkAdgBlAC4AcABkAGYADwAaAAwATQBhAGMAaQBuAHQAbwBzAGgAIABIAEQAEgBnVXNlcnMvam9hcXVpbV9sbGlzdGVycmkvQmlibGlvZ3JhZmlhL1BhcGVycy9CdWx1dC9FeHByZXNzaXZlIHNwZWVjaCBzeW50aGVzaXMgdXNpbmcgYSBjb25jYXRlbmF0aXZlLnBkZgAAEwABLwAAFQACABj//wAAgAbSGxwdHlokY2xhc3NuYW1lWCRjbGFzc2VzXU5TTXV0YWJsZURhdGGjHR8gVk5TRGF0YVhOU09iamVjdNIbHCIjXE5TRGljdGlvbmFyeaIiIF8QD05TS2V5ZWRBcmNoaXZlctEmJ1Ryb290gAEACAARABoAIwAtADIANwBAAEYATQBVAGAAZwBqAGwAbgBxAHMAdQB3AIQAjgDpAO4A9gM8Az4DQwNOA1cDZQNpA3ADeQN+A4sDjgOgA6MDqAAAAAAAAAIBAAAAAAAAACgAAAAAAAAAAAAAAAAAAAOq},
	Bdsk-Url-1 = {http://www2.research.att.com/~ttsweb/tts/papers/2002_ICSLP/expressive.pdf}}
Downloads: 0