This work provides a detailed overview of related work on the emotion recognition task. Common definitions for emotions are given and known issues such as cultural dependencies are explained. Furthermore, labeling issues are exempli-fied, and comparable recognition experiments and data collections are introduced in order to give an overview of the state of the art. A comparison of possible data acquisition methods, such as recording acted emotional material, induced emotional data recorded in Wizard-of-Oz scenarios, as well as real-life emotions, is provided. A complete automatic emotion recognizer scenario comprising a possible way of collecting emotional data, a human perception experiment for data quality benchmarking, the extraction of commonly used features, and recognition experiments using multi-classifier systems and RBF ensembles, is included. Results close to human performance were achieved using RBF ensembles, that are simple to implement and trainable in a fast manner.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
E. Douglas-Cowie, R. Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J. Taylor “Emotion recognition in human-computer interaction,” IEEE Signal Processing Magazine, vol. 18, no. 1, pp. 32-80, 2001.
C. Darwin, The Expression of Emotions in Man and Animals, Reprinted by Univiersity of Chicago Press, Chicago, 1965.
R. W. Reiber and D. K. Robinson, Wilhelm Wundt in History: The Making of a Scientific Psychology, Kluwer, Dordrecht, 2001.
R. Harre and R. Finlay-Jones, Emotion talk across times, pp.220-233, Blackwell, Oxford, 1986.
H. Morsbach and W. J. Tyler, A Japanese Emotion: Amae, pp. 289-307, Blackwell, Oxford, 1986.
J. R. Averill, Acquisition of Emotion in adulthood, p. 100, Blackwell, Oxford, 1986.
V. Petrushin, “Emotion in speech: Recognition and application to call cen-ters,” in Proceedings of Artificial Neural Networks Engineering, November 1999, pp. 7-10.
S. Yacoub, S. Simske, X. Lin, and J. Burns, “Recognition of emotions in interactive voice response systems,” in Proceedings of Eurospeech, 2003.
F. Dellaert, T. Polzin, and A. Waibel, “Recognizing emotion in speech,” in Proceedings of the ICSLP, 1996, pp. 1970-1973.
C. Lee, S. Narayanan, and R. Pieraccini, “Classifying emotions in human machine spoken dialogs,” in Proceedings of International Conference on Multimedia and Expo (ICME), 2002, vol. 1, pp. 737-740.
F. Yu, E. Chang, X. Yingqing, and H.-Y. Shum, “Emotion detection from speech to enrich multimedia content,” in Proceedings of the Second IEEE Pacific Rim Conference on Multimedia, London, UK, 2001, pp. 550-557, Springer.
C. M. Lee, S. Yildirim, M. Bulut, A. Kazemzadeh, C. Busso, Z. Deng, S. Lee, and S. Narayanan, “Emotion recognition based on phoneme classes,” in Proceedings of ICSLP 2004, 2004.
K. R. Scherer, R. Banse, H. G. Wallbott, and T. Goldbeck, “Vocal cues in emotion encoding and decoding,” Motivation and Emotion, vol. 15, no. 2, pp. 123-148, 1991.
E. Douglas-Cowie, R. Cowie, and C. Cox,“Beyond emotion archetypes: Databases for emotion modeling using neural networks,” Neural Networks, vol. 18, no. 4, pp. 371-388, 2005.
A. Noam, A. Bat-Chen, and G. Ronit, “Perceiving prominence and emotion in speech - a cross lingual study,” in Proceeding of SP-2004, 2004, pp. 375-378.
F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and B. Weiss, “A database of german emotional speech,” in Proceedings of Interspeech, 2005.
E. Douglas-Cowie, R. Cowie, and M. Schroeder, “A new emotion database: Considerations, sources and scope,” in Proceedings of the ISCA Workshop on Speech and Emotion, 2000, pp. 39-44.
R. Cowie, “Describing the emotional states expressed in speech,” in Proceedings of the ISCA Workshop on Speech and Emotion, 2000, pp. 11-18.
E. Douglas-Cowie, R. Cowie, and M. Schroeder, “The description of naturally occurring emotional speech,” in 15th International Conference of Phonetic Sciences, 2003, pp. 2877-2880.
I. H. Witten and E. Frank, Data Mining: Practical machine learning tools and techniques, 2nd edition, Morgan Kaufmann, San Francisco, 2005.
M. B. Arnold, Emotion and Personality: Vol. 2 Physiological Aspects, Columbia University Press, New York, 1960.
P.-M. Strauss, H. Hoffmann, W. Minker, H. Neumann, G. Palm, S. Scherer, F. Schwenker, H. Traue, W. Walter, and U. Weidenbacher, “Wizard-of-oz data collection for perception and interaction in multi-user environments,” in International Conference on Language Resources and Evaluation (LREC), 2006.
C. Stanislavski, An Actor Prepares, Routledge, New York, 1989.
S. T. Jovicic, Z. Kasic, M. Dordevic, and M. Rajkovic, “Serbian emotional speech database: design, processing and evaluation,” in Proceedings of SPECOM-2004, 2004, pp. 77-81.
T. Seppnen, J. Toivanen, and E. Vyrynen, “Mediateam speech corpus: a first large finnish emotional speech database,” in Proceeding of 15th International Congress of Phonetic Sciences, 2003, vol. 3, pp. 2469-2472.
N. Campbell, “The recording of emotional speech; jst/crest database research,” in Proceedings of International Conference on Language Resources and Evaluation (LREC), 2002, vol. 6, pp. 2026-2032.
P. Ekman and W. Friesen, Unmasking the Face, Prentice-Hall, Englewood Cliffs, 1975.
A. Nilsonne, “Speech characteristics as indicators of depressive illness,” Acta Psychiatrica Scandinavica, vol. 77, pp. 253-263, 1988.
R. Cowie, A. Wichmann, E. Douglas-Cowie, P. Hartley, and C. Smith, “The prosodic correlates of expressive reading,” in 14th International Congress of Phonetic Sciences, 1999, pp. 2327-2330.
L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Prentice-Hall Signal Processing Series, Englewood Cliffs, NJ, 1978.
C. J. Plack, A. J. Oxenham, R. R. Fay, and A. N. Popper, Eds., Pitch - Neural Coding and Perception, Series: Springer Handbook of Auditory Research, vol. 24, Springer, Berlin Heidelberg New York, 2005.
L. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms, Wiley, New York, 2004.
T. K. Ho, Multiple Classifier Combination: Lessons and Next Steps, chapter 7, World Scientific, Singapore, 2002.
T. Kohonen, Self-Organizing Maps, Springer, Berlin Heidelberg New York, 1995.
F. Schwenker, H. A. Kestler, and G. Palm, “Three learning phases for radial basis function networks,” Neural Networks, vol. 14, pp. 439-458, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Scherer, S., Schwenker, F., Palm, G. (2008). Emotion Recognition from Speech Using Multi-Classifier Systems and RBF-Ensembles. In: Prasad, B., Prasanna, S.R.M. (eds) Speech, Audio, Image and Biomedical Signal Processing using Neural Networks. Studies in Computational Intelligence, vol 83. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75398-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-75398-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75397-1
Online ISBN: 978-3-540-75398-8
eBook Packages: EngineeringEngineering (R0)