Emotion Recognition from Speech Using Multi-Classifier Systems and RBF-Ensembles

Scherer, Stefan; Schwenker, Friedhelm; Palm, Günther

doi:10.1007/978-3-540-75398-8_3

Stefan Scherer⁴,
Friedhelm Schwenker⁴ &
Günther Palm⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 83))

2126 Accesses
12 Citations

This work provides a detailed overview of related work on the emotion recognition task. Common definitions for emotions are given and known issues such as cultural dependencies are explained. Furthermore, labeling issues are exempli-fied, and comparable recognition experiments and data collections are introduced in order to give an overview of the state of the art. A comparison of possible data acquisition methods, such as recording acted emotional material, induced emotional data recorded in Wizard-of-Oz scenarios, as well as real-life emotions, is provided. A complete automatic emotion recognizer scenario comprising a possible way of collecting emotional data, a human perception experiment for data quality benchmarking, the extraction of commonly used features, and recognition experiments using multi-classifier systems and RBF ensembles, is included. Results close to human performance were achieved using RBF ensembles, that are simple to implement and trainable in a fast manner.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

E. Douglas-Cowie, R. Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J. Taylor “Emotion recognition in human-computer interaction,” IEEE Signal Processing Magazine, vol. 18, no. 1, pp. 32-80, 2001.
Article Google Scholar
C. Darwin, The Expression of Emotions in Man and Animals, Reprinted by Univiersity of Chicago Press, Chicago, 1965.
Google Scholar
R. W. Reiber and D. K. Robinson, Wilhelm Wundt in History: The Making of a Scientific Psychology, Kluwer, Dordrecht, 2001.
Google Scholar
R. Harre and R. Finlay-Jones, Emotion talk across times, pp.220-233, Blackwell, Oxford, 1986.
Google Scholar
H. Morsbach and W. J. Tyler, A Japanese Emotion: Amae, pp. 289-307, Blackwell, Oxford, 1986.
Google Scholar
J. R. Averill, Acquisition of Emotion in adulthood, p. 100, Blackwell, Oxford, 1986.
Google Scholar
V. Petrushin, “Emotion in speech: Recognition and application to call cen-ters,” in Proceedings of Artificial Neural Networks Engineering, November 1999, pp. 7-10.
Google Scholar
S. Yacoub, S. Simske, X. Lin, and J. Burns, “Recognition of emotions in interactive voice response systems,” in Proceedings of Eurospeech, 2003.
Google Scholar
F. Dellaert, T. Polzin, and A. Waibel, “Recognizing emotion in speech,” in Proceedings of the ICSLP, 1996, pp. 1970-1973.
Google Scholar
C. Lee, S. Narayanan, and R. Pieraccini, “Classifying emotions in human machine spoken dialogs,” in Proceedings of International Conference on Multimedia and Expo (ICME), 2002, vol. 1, pp. 737-740.
Google Scholar
F. Yu, E. Chang, X. Yingqing, and H.-Y. Shum, “Emotion detection from speech to enrich multimedia content,” in Proceedings of the Second IEEE Pacific Rim Conference on Multimedia, London, UK, 2001, pp. 550-557, Springer.
Google Scholar
C. M. Lee, S. Yildirim, M. Bulut, A. Kazemzadeh, C. Busso, Z. Deng, S. Lee, and S. Narayanan, “Emotion recognition based on phoneme classes,” in Proceedings of ICSLP 2004, 2004.
Google Scholar
K. R. Scherer, R. Banse, H. G. Wallbott, and T. Goldbeck, “Vocal cues in emotion encoding and decoding,” Motivation and Emotion, vol. 15, no. 2, pp. 123-148, 1991.
Article Google Scholar
E. Douglas-Cowie, R. Cowie, and C. Cox,“Beyond emotion archetypes: Databases for emotion modeling using neural networks,” Neural Networks, vol. 18, no. 4, pp. 371-388, 2005.
Article Google Scholar
A. Noam, A. Bat-Chen, and G. Ronit, “Perceiving prominence and emotion in speech - a cross lingual study,” in Proceeding of SP-2004, 2004, pp. 375-378.
Google Scholar
F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and B. Weiss, “A database of german emotional speech,” in Proceedings of Interspeech, 2005.
Google Scholar
E. Douglas-Cowie, R. Cowie, and M. Schroeder, “A new emotion database: Considerations, sources and scope,” in Proceedings of the ISCA Workshop on Speech and Emotion, 2000, pp. 39-44.
Google Scholar
R. Cowie, “Describing the emotional states expressed in speech,” in Proceedings of the ISCA Workshop on Speech and Emotion, 2000, pp. 11-18.
Google Scholar
E. Douglas-Cowie, R. Cowie, and M. Schroeder, “The description of naturally occurring emotional speech,” in 15th International Conference of Phonetic Sciences, 2003, pp. 2877-2880.
Google Scholar
I. H. Witten and E. Frank, Data Mining: Practical machine learning tools and techniques, 2nd edition, Morgan Kaufmann, San Francisco, 2005.
MATH Google Scholar
M. B. Arnold, Emotion and Personality: Vol. 2 Physiological Aspects, Columbia University Press, New York, 1960.
Google Scholar
P.-M. Strauss, H. Hoffmann, W. Minker, H. Neumann, G. Palm, S. Scherer, F. Schwenker, H. Traue, W. Walter, and U. Weidenbacher, “Wizard-of-oz data collection for perception and interaction in multi-user environments,” in International Conference on Language Resources and Evaluation (LREC), 2006.
Google Scholar
C. Stanislavski, An Actor Prepares, Routledge, New York, 1989.
Google Scholar
S. T. Jovicic, Z. Kasic, M. Dordevic, and M. Rajkovic, “Serbian emotional speech database: design, processing and evaluation,” in Proceedings of SPECOM-2004, 2004, pp. 77-81.
Google Scholar
T. Seppnen, J. Toivanen, and E. Vyrynen, “Mediateam speech corpus: a first large finnish emotional speech database,” in Proceeding of 15th International Congress of Phonetic Sciences, 2003, vol. 3, pp. 2469-2472.
Google Scholar
N. Campbell, “The recording of emotional speech; jst/crest database research,” in Proceedings of International Conference on Language Resources and Evaluation (LREC), 2002, vol. 6, pp. 2026-2032.
Google Scholar
P. Ekman and W. Friesen, Unmasking the Face, Prentice-Hall, Englewood Cliffs, 1975.
Google Scholar
A. Nilsonne, “Speech characteristics as indicators of depressive illness,” Acta Psychiatrica Scandinavica, vol. 77, pp. 253-263, 1988.
Article Google Scholar
R. Cowie, A. Wichmann, E. Douglas-Cowie, P. Hartley, and C. Smith, “The prosodic correlates of expressive reading,” in 14th International Congress of Phonetic Sciences, 1999, pp. 2327-2330.
Google Scholar
L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Prentice-Hall Signal Processing Series, Englewood Cliffs, NJ, 1978.
Google Scholar
C. J. Plack, A. J. Oxenham, R. R. Fay, and A. N. Popper, Eds., Pitch - Neural Coding and Perception, Series: Springer Handbook of Auditory Research, vol. 24, Springer, Berlin Heidelberg New York, 2005.
Google Scholar
L. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms, Wiley, New York, 2004.
Book MATH Google Scholar
T. K. Ho, Multiple Classifier Combination: Lessons and Next Steps, chapter 7, World Scientific, Singapore, 2002.
Google Scholar
T. Kohonen, Self-Organizing Maps, Springer, Berlin Heidelberg New York, 1995.
Google Scholar
F. Schwenker, H. A. Kestler, and G. Palm, “Three learning phases for radial basis function networks,” Neural Networks, vol. 14, pp. 439-458, 2001.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Neural Information Processing, University of Ulm, 89069, Ulm, Germany
Stefan Scherer, Friedhelm Schwenker & Günther Palm

Authors

Stefan Scherer
View author publications
You can also search for this author in PubMed Google Scholar
Friedhelm Schwenker
View author publications
You can also search for this author in PubMed Google Scholar
Günther Palm
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Information Sciences, Florida A&M University, Tallahassee, FL 32307, USA
Bhanu Prasad
Department of Electronics and Communication Engineering, Indian Institute of Technology Guwahati, Guwahati, India
S. R. Mahadeva Prasanna

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Scherer, S., Schwenker, F., Palm, G. (2008). Emotion Recognition from Speech Using Multi-Classifier Systems and RBF-Ensembles. In: Prasad, B., Prasanna, S.R.M. (eds) Speech, Audio, Image and Biomedical Signal Processing using Neural Networks. Studies in Computational Intelligence, vol 83. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75398-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-75398-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75397-1
Online ISBN: 978-3-540-75398-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics