Abstract
The goals of this research were: (1) to develop a system that will automatically measure changes in the emotional state of a speaker by analyzing his/her voice, (2) to validate this system with a controlled experiment and (3) to visualize the results to the speaker in 2-d space. Natural (non-acted) human speech of 77 (Dutch) speakers was collected and manually divided into meaningful speech units. Three recordings per speaker were collected, in which he/she was in a positive, neutral and negative state. For each recording, the speakers rated 16 emotional states on a 10-point Likert Scale. The Random Forest algorithm was applied to 207 speech features that were extracted from recordings to qualify (classification) and quantify (regression) the changes in speaker’s emotional state. Results showed that predicting the direction of change of emotions and predicting the change of intensity, measured by Mean Squared Error, can be done better than the baseline (the most frequent class label and the mean value of change, respectively). Moreover, it turned out that changes in negative emotions are more predictable than changes in positive emotions. A controlled experiment investigated the difference in human and machine performance on judging the emotional states in one’s own voice and that of another. Results showed that humans performed worse than the algorithm in the detection and regression problems. Humans, just like the machine algorithm, were better in detecting changing negative emotions rather than positive ones. Finally, results of applying the Principal Component Analysis (PCA) to our data provided a validation of dimensional emotion theories and they suggest that PCA is a promising technique for visualizing user’s emotional state in the envisioned application.
Similar content being viewed by others
References
Batliner A, Steidle S, Schuller B, Seppi D, Vogt T, Wagner J, Vidrascu L, Aharonson V, Kessous L, Amir N (2010) Whodunnit—searching for the most important speech feature types signalling emotion-related user states in speech. Comput Speech Lang. doi:10.1016/j.csl.2009.12.003
Breiman L (2001) Random forests. Mach Learn 45:5–32
Breazeal C, Brooks R (2005) Robot emotion: a functional perspective. In: Fellous J-M, Arbib MA (eds) Who needs emotions? Oxford University Press, New York
Castellano G, Kessous G, Caridakis G (2008) Emotion recognition through multiple modalities: face, body gesture, speech. In: Peter C, Beale R (eds) Affect and emotion in human-computer interaction. Lecture notes in computer science, vol 4868. Springer, Berlin, pp 92–103
Duda RO, Hart P, Stork D (2000) Pattern classification, 2nd edn. Wiley, New York
Ekman P (1992) An argument for basic emotions. Cogn Emot 6:169–200
Fredrickson BL, Mancuso R, Branigan C, Tugade M (2000) The undoing effect of positive emotions. Motiv Emot 24:237–258
Frijda NH (2007) The laws of emotion. Lawrence Erlbaum Associates Publishers, Hillsdale
GAQ (2002) Geneva appraisal questionnaire. See: http://www.affective-sciences.org/system/files/page/2636/GAQ_English.PDF
Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning, 2nd edn. Springer, New York
Kurematsu M, Amanuma S, Hakura J, Fujita H (2008) An extraction of emotion in human speech using cluster analysis and a regression tree. In: Fujita H, Sasaki J (eds) Proceedings of the 10th WSEAS international conference on applied computer science. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point, pp 346–350
Laukka P, Neiberg D, Forsell M, Karlsson I, Elenius K (2011) Expression of affect in spontaneous speech: acoustic correlates and automatic detection of irritation and resignation. Comput Speech Lang 25:84–104
Li X, Tao J, Johnson M, Soltis J, Savage A, Leong K, Newman J (2007) Stress and emotion classification using jitter and shimmer features. In: IEEE international conference on acoustics, speech and signal processing (ICASSP 2007), pp 1081–1084
van der Maaten LJP, Postma E, van der Herik H (2009) Dimensionality reduction: a comparative review. Tilburg University technical report, TiCC-TR 2009-005
McIntyre G, Göcke R (2007) Towards affective sensing. In: Jacko JA (ed) Proc of the 12th international conference on human-computer interaction: intelligent multimodal interaction environments, part III (HCI’07). Lecture notes in computer science, vol 4552. Springer, Berlin, pp 411–420
Russel JA (1980) A circumplex model of affect. J Pers Soc Psychol 39:1161–1178
Schölkopf B, Smola AJ (2001) Learning with kernels. support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
Tawari A, Trivedi M (2010) Speech based emotion classification framework for driver assistance system. In: Intelligent vehicles symposium (IV), 21–24 June 2010 IEEE Press, New York, pp 174–178. doi:10.1109/IVS.2010.5547956
Vogt T, André E, Wagner J (2007) Automatic recognition of emotions from speech: a review of the literature and recommendations for practical realisation. In: Jacko JA (ed) Proc of the 12th international conference on human-computer interaction: intelligent multimodal interaction environments, part III (HCI’07). Lecture notes in computer science, vol 4552. Springer, Berlin, pp 75–91
Yik M, Russel J, Steiger J (2011) A 12-point circumplex structure of core affect. Emotion 11(4):705–731
Zhang C, Wu J, Xiao X, Wang Z (2006) Pronunciation variation modeling for Mandarin with accent. In: Proceedings of ICSLP’06, Pittsburgh, USA, pp 709–712
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
van der Wal, C.N., Kowalczyk, W. Detecting changing emotions in human speech by machine and humans. Appl Intell 39, 675–691 (2013). https://doi.org/10.1007/s10489-013-0449-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-013-0449-1