Speech Emotion Perception by Human and Machine

Tóth, Szabolcs Levente; Sztahó, David; Vicsi, Klára

doi:10.1007/978-3-540-70872-8_16

Speech Emotion Perception by Human and Machine

Szabolcs Levente Tóth²³,
David Sztahó²³ &
Klára Vicsi²³

Conference paper

1044 Accesses
17 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5042))

Abstract

The human speech contains and reflects information about the emotional state of the speaker. The importance of research of emotions is increasing in telematics, information technologies and even in health services. The research of the mean acoustical parameters of the emotions is a very complicated task. The emotions are mainly characterized by suprasegmental parameters, but other segmental factors can contribute to the perception of the emotions as well. These parameters are varying within one language, according to speakers etc. In the first part of our research work, human emotion perception was examined. Steps of creating an emotional speech database are presented. The database contains recordings of 3 Hungarian sentences with 8 basic emotions pronounced by nonprofessional speakers. Comparison of perception test results obtained with database recorded by nonprofessional speakers showed similar recognition results as an earlier perception test obtained with professional actors/actresses. It was also made clear, that a neutral sentence before listening to the expression of the emotion pronounced by the same speakers cannot help the perception of the emotion in a great extent. In the second part of our research work, an automatic emotion recognition system was developed. Statistical methods (HMM) were used to train different emotional models. The optimization of the recognition was done by changing the acoustic preprocessing parameters and the number of states of the Markov models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amir, N., Ziv, S., Cohen, R.: Characteristics of authentic anger in Hebrew speech. In: Proceedings of Eurospeech 2003, Geneva, pp. 713–716 (2003)
Google Scholar
Ang, J., Dhillon, R., Krupski, A., Shipberg, E., Stockle, A.: Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: Proceedings of ICLSP 2002, Denver, Colorado, pp. 2037–2040 (2002)
Google Scholar
Banse, R., Scherer, K.: Acoustic profiles in vocal emotion expression. J. Pers. Social Psychol. 70(3), 614–636 (1996)
Article Google Scholar
Bänziger, T., Scherer, K.R.: Using Actor Portrayals to Systematically Study Multimodal Emotion Expression: The GEMEP Corpus. In: Affective Computing and Intelligent Interaction, pp. 476–487. Springer, Berlin (2007)
Chapter Google Scholar
Batliner, A., Fischer, K., Huber, R., Spilker, J., Noeth, E.: How to find trouble in communication. Speech Commun. 40, 117–143 (2003)
Article MATH Google Scholar
Douglas-Cowie, E., Campbell, N., Cowie, R., Roach, P.: Emotional speech: towards a new generation of databases. Speech Communication 40, 33–60 (2003)
Article MATH Google Scholar
Engberg, I.S., Hansen, A.V., Andersen, O., Dalsgaard, P.: Design, recording and verification of a Danish Emotional Speech Database. In: Proceedings of the Eurospeech 1997, Rhodes, Greece (1997)
Google Scholar
Fék, M., Olaszy, G., Szabó, J., Németh, G., Gordos, G.: Érzelem kifejezése gépi beszéddel. Beszédkutatás 2005. MTA-NyTI, pp. 134–144 (2005)
Google Scholar
France, D., Shiavi, R., Silverman, S., Silverman, M., Wilkes, D.: Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans. Biomed. Engng. 47(7), 829–837 (2000)
Article Google Scholar
Groningen corpus S0020 ELRA, http://www.icp.inpg.fr/ELRA
Hozjan, V., Kacic, Z.: A rule-based emotion-dependent feature extraction method for emotion analysis from speech. The Journal of the Acoustical Society of America 119(5), 3109–3120 (2006)
Article Google Scholar
Hozjan, V., Kacic, Z.: Context-Independent Multilingual Emotion Recognition from Speech Signals. International Journal of Speech Technology 6, 311–320 (2003)
Article Google Scholar
HTK: HMM Toolkit; Speech Recognition Toolkit, http://htk.eng.cam.ac.uk/docs/docs
Kienast, M., Sendlmeier, W.F.: Acoustical analysis of spectral and temporal changes in emotional speech. In: Proceedings of the ISCA ITRW on Speech and Emotion, Newcastle, September 5–7, pp. 92–97 (2000)
Google Scholar
Mozziconacci, S.: Speech Variability and Emotion: Production and Perception. Technical University of Eindhoven, Proefschrift (1998)
Google Scholar
MPEG-4: ISO/IEC 14496 standard (1999), http://www.iec.ch
Navas, E., Hernáez, I., Luengo, I.: An Objective and Subjective Study of the Role of Semantics and Prosodic Features in Building Corpora for Emotional TTS. IEEE transactions on audio, speech, and language processing 14(4) (July 2006)
Google Scholar
Sztahó, D.: Emotion perception and automatic emotion recognition in Hungarian. In: Proceedings of Scientific Conference for Students, BME (2007), http://www.vik.bme.hu/tdk
Vicsi, K., Szaszák, G.: Automatic Segmentation for Continuous Speech on Word Level Based Suprasegmental Features. International Journal of Speech Technology (April 2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Telecommunications and Media Informatics, Laboratory of Speech Acoustics, Budapest University of Technology and Economics, Stoczek u. 2, 1111, Budapest, Hungary
Szabolcs Levente Tóth, David Sztahó & Klára Vicsi

Authors

Szabolcs Levente Tóth
View author publications
You can also search for this author in PubMed Google Scholar
David Sztahó
View author publications
You can also search for this author in PubMed Google Scholar
Klára Vicsi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Psychology, Second University of Naples, and IIASS, Via Pellegrino 19, 84019, Vietri sul Mare (SA), Italy
Anna Esposito
ATRC Center, Wright State University, Dayton, OH, USA
Nikolaos G. Bourbakis
Human Computer Interaction Group, University of Patras, Rio Patras, Greece
Nikolaos Avouris
Department of Computer Engineering, University of Patras, Patras, Greece
Ioannis Hatzilygeroudis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tóth, S.L., Sztahó, D., Vicsi, K. (2008). Speech Emotion Perception by Human and Machine. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds) Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction. Lecture Notes in Computer Science(), vol 5042. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70872-8_16

Download citation

DOI: https://doi.org/10.1007/978-3-540-70872-8_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70871-1
Online ISBN: 978-3-540-70872-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics