The production and recognition of emotions in speech: features and algorithms. Oudeyer, P Y International Journal of Human-Computer Studies, 59:157--183, 2003. bibtex: oudeyer_production_2003
The production and recognition of emotions in speech: features and algorithms [pdf]Paper  doi  abstract   bibtex   
This paper presents algorithms that allow a robot to express its emotions by modulating the intonation of its voice. They are very simple and efficiently provide life-like speech thanks to the use of concatenative speech synthesis. We describe a technique which allows to continuously control both the age of a synthetic voice and the quantity of emotions that are expressed. Also, we present the first large-scale data mining experiment about the automatic recognition of basic emotions in informal everyday short utterances. We focus on the speaker-dependent problem. We compare a large set of machine learning algorithms, ranging from neural networks, Support Vector Machines or decision trees, together with 200 features, using a large database of several thousands examples. We show that the difference of performance among learning schemes can be substantial, and that some features which were previously unexplored are of crucial importance. An optimal feature set is derived through the use of a genetic algorithm. Finally, we explain how this study can be applied to real world situations in which very few examples are available. Furthermore, we describe a game to play with a personal robot which facilitates teaching of examples of emotional utterances in a natural and rather unconstrained manner.
@article{oudeyer_production_2003,
	Abstract = {This paper presents algorithms that allow a robot to express its emotions by modulating the intonation of its voice. They are very simple and efficiently provide life-like speech thanks to the use of concatenative speech synthesis. We describe a technique which allows to continuously control both the age of a synthetic voice and the quantity of emotions that are expressed. Also, we present the first large-scale data mining experiment about the automatic recognition of basic emotions in informal everyday short utterances. We focus on the speaker-dependent problem. We compare a large set of machine learning algorithms, ranging from neural networks, Support Vector Machines or decision trees, together with 200 features, using a large database of several thousands examples. We show that the difference of performance among learning schemes can be substantial, and that some features which were previously unexplored are of crucial importance. An optimal feature set is derived through the use of a genetic algorithm. Finally, we explain how this study can be applied to real world situations in which very few examples are available. Furthermore, we describe a game to play with a personal robot which facilitates teaching of examples of emotional utterances in a natural and rather unconstrained manner.},
	Author = {Oudeyer, P Y},
	Doi = {10.1016/S1071-5819(02)00141-6},
	File = {Attachment:files/8761/Oudeyer - 2003 - The production and recognition of emotions in speech features and algorithms.pdf:application/pdf},
	Journal = {International Journal of Human-Computer Studies},
	Keywords = {emotions, speaking styles, speech synthesis, speech technology},
	Note = {bibtex: oudeyer\_production\_2003},
	Pages = {157--183},
	Title = {The production and recognition of emotions in speech: features and algorithms},
	Url = {http://pyoudeyer.com/emotionsIJHCS.pdf},
	Volume = {59},
	Year = {2003},
	Bdsk-Url-1 = {http://pyoudeyer.com/emotionsIJHCS.pdf},
	Bdsk-Url-2 = {http://dx.doi.org/10.1016/S1071-5819(02)00141-6}}
Downloads: 0