Abstract
To obtain a more human-like interaction with technical systems, those have to be adaptable to the users’ individual skills, preferences, and current emotional state. In human-human interaction (HHI) the behaviour of the speaker is characterised by semantic and prosodic cues, given as short feedback signals. These signals minimally communicate certain dialogue functions such as attention, understanding, confirmation, or other attitudinal reactions. Thus, these signals play an important role in the progress and coordination of interaction. They allow the partners to inform each other of their behavioural or affective state without interrupting the ongoing dialogue.
Vocal communication provides acoustic details revealing the speaker’s feelings, believes, and social relations. Incorporating discourse particles (DPs) in human-computer interaction (HCI) systems will allow the detection of complex emotions, which are currently hard to access. Complex emotions in turn are closely related to human behaviour. Hence, integrating automatic DP detection and complex emotion assignment in HCI systems provides a first approach to the integration of human behaviour understanding in HCI systems.
In this paper we present methods allowing to extract the pitch-contour of DPs and to assign complex emotions to observed DPs. We investigate the occurrences of DPs in naturalistic HCI and show that DPs may be assigned to complex emotions automatically. Furthermore, we show that DPs are indeed related to behaviour, showing an age-gender specific usage during naturalistic HCI. Additionally, we prove that DPs may be used to automatically detect and classify complex emotions during HCI.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Allwood, J., Nivre, J., Ahlsén, E.: On the semantics and pragmatics of linguistic feedback. Journal of Semantics 9(1), 1–26 (1992)
Benus, S., Gravana, A., Hirschberg, J.: The Prosody of Backchannels in American Englisch. In: Proceedings of the 16th International Congress of Phonetic Sciences, pp. 1065–1068. Saarbrücken, Germany (2007)
Ekman, P.: Basic Emotions, pp. 45–60. John Wiley & Sons, Ltd., Sussex (2005)
Fischer, K., Wrede, B., Brindöpke, C., Johanntokrax, M.: Quantitative und funktionale Analysen von Diskurspartikeln im Computer Talk. International Journal for Language Data Processing 20(1-2), 85–100 (1996)
Gerhard, D.: Pitch Extraction and Fundamental Frequency: History and Current Techniques. Tech. Rep. TR-CS 2003-06, Regina, Saskatchewan, Canada (2003)
Hartmann, K., Siegert, I., Philippou-Hübner, D., Wendemuth, A.: Emotion-Detection in HCI: From Speech Features to Emotion Space. In: Proc. of 12th IFAC/IFIP/IFORS/IEA Symposium on Analysis, Design, and Evaluation of Human-Machine Systems (in press, 2013)
Kehrein, R., Rabanus, S.: Ein Modell zur funktionalen Beschreibung von Diskurspartikeln. In: Neue Wege der Intonationsforschung, Germanistische Linguistik, vol. 157-158, pp. 33–50. Georg Olms, Hildesheim (2001)
Kockmann, M., Burget, L., Černocký, J.H.: Application of speaker- and language identification state-of-the-art techniques for emotion recognition. Speech Commun. 53(9-10), 1172–1185 (2011)
Lacey, J.I.: Somatic response patterning and stress: Some revisions of activation theory. In: Appley, M.H., Trumbull, R. (eds.) Psychological Stress: Issues in Research. Appleton-Century-Crofts, New York (1967)
Ladd, R.D.: Intonational Phonology. Studies in Linguistics, vol. 79. Cambridge University Press (1996)
Lange, J., Frommer, J.: Subjective experience and intentional setting within intervies of User-Companion-Interaction. In: Informatik Schafft Communities, Beiträge der 41. Jahrestagung der GI. Lecture Notes in Informatics, vol. 192, p. 240 (2011)
Martin, J.C., Niewiadomski, R., Devillers, L., Buisine, S., Pelachaud, C.: Multimodal complex emotions: Gesture expressivity and blended facial expressions. I. J. Humanoid Robotics (3), 269–291 (2006)
Müller, M.: Information Retrieval for Music and Motion. In: Dynamic Time Warping. Springer, Heidelberg (2007)
Paschen, H.: Die Funktion der Diskurspartikel HM. Master’s thesis, University Mainz (1995)
Patel, S., Scherer, K.R., Björkner, E., Sundberg, J.: Mapping emotions into acoustic space: the role of voice production. Biological Psychology 87(1), 93–98 (2011)
Plutchik, R.: Emotion, a psychoevolutionary synthesis. Harper & Row (1980)
Rösner, D., Friesen, R., Otto, M., Lange, J., Haase, M., Frommer, J.: Intentionality in interacting with companion systems an empirical approach. In: Jacko, J.A. (ed.) Human-Computer Interaction, Part III, HCII 2011. LNCS, vol. 6763, pp. 593–602. Springer, Heidelberg (2011)
Schmidt, J.E.: Bausteine der Intonation. In: Neue Wege der Intonationsforschung, Germanistische Linguistik, vol. 157-158, pp. 9–32. Georg Olms, Hildesheim (2001)
Siegert, I., Böck, R., Wendemuth, A.: The influence of context knowledge for multimodal annotation on natural material. In: Joint Proc. of IVA 2012 Workshops (Multimodal Analyses Enabling Artificial Agents in HCI) (2012)
Siegert, I., Prylipko, D., Hartmann, K., Böck, R., Wendemuth, A.: Investigating the form-function-relation of the discourse particle “hm” in a naturalistic human-computer interaction. In: 23rd Italian Workshop on Neural Nets. Smart Innovation, Systems and Technologies. Springer, Heidelberg (accepted 2013)
Ward, N.: Pragmatic functions of prosodic features in non-lexical utterances. In: Proceedings of Speech Prosody 2004, pp. 325–328. Nara, Japan (2004)
Wendemuth, A., Biundo, S.: A Companion Technology for Cognitive Technical Systems. In: Esposito, A., Esposito, A.M., Vinciarelli, A., Hoffmann, R., Müller, V.C. (eds.) COST 2102. LNCS, vol. 7403, pp. 89–103. Springer, Heidelberg (2012)
Xuedong, H., Jack, M., Ariki, Y.: Hidden Markov Models for Speech Recognition. Edinburgh University Press, Edinburgh (1990)
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK book (for HTK Version 3.4). Cambridge University Press, Cambridge (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Siegert, I., Hartmann, K., Philippou-Hübner, D., Wendemuth, A. (2013). Human Behaviour in HCI: Complex Emotion Detection through Sparse Speech Features. In: Salah, A.A., Hung, H., Aran, O., Gunes, H. (eds) Human Behavior Understanding. HBU 2013. Lecture Notes in Computer Science, vol 8212. Springer, Cham. https://doi.org/10.1007/978-3-319-02714-2_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-02714-2_21
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02713-5
Online ISBN: 978-3-319-02714-2
eBook Packages: Computer ScienceComputer Science (R0)