Skip to main content
Log in

Analysis of significant dialog events in realistic human–computer interaction

  • Original Paper
  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

This paper addresses issues of automatically detecting significant dialog events (SDEs) in naturalistic HCI, and of deducing trait-specific conclusions relevant for the design of spoken dialog systems. We perform our investigations on the multimodal LAST MINUTE corpus with records from naturalistic interactions. First, we used textual transcripts to analyse interaction styles and discourse structures. We found indications that younger subjects prefer a more technical style in communication with dialog systems. Next, we model the subject’s internal success state with a hidden Markov model trained using the observed sequences of system feedback. This reveals that younger subjects interact significantly more successful with technical systems. Aiming on automatic detection of specific subjects’s reactions, we then semi-automatically annotate SDEs—phrases indicating an irregular, i.e. not-task-oriented subject behavior. We use both acoustic and linguistic features to build several trait-specific classifiers for dialog phases, which showed pronouncedly different accuracies for diverse age and gender groups. The presented investigations coherently support age-dependence of both expressiveness and problem-solving ability. This in turn induces design rules for future automatic designated “companion” systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://mary.dfki.de/.

  2. Here and further ‘W’ stands for wizard and ‘S’ for subject. The subject’s code is given once at his first phrase. Transcripts are given with the GAT 2 minimal coding. English glosses added as a convenience for the reader.

  3. http://jahmm.googlecode.com/.

  4. The following features are calculated as ratio of corresponding items to the number of words.

References

  1. Batliner A, Fischer K, Huber R, Spilker J, Nöth E (2003) How to find trouble in communication. Speech Commun 40(1–2):117–143

    Google Scholar 

  2. Batliner A, Steidl S, Schuller B, Seppi D, Vogt T, Wagner J, Devillers L, Vidrascu L, Aharonson V, Kessous L (2011) Whodunnit—searching for the most important feature types signalling emotion-related user states in speech. Comput Speech Lang 25(1):4–28

    Article  Google Scholar 

  3. Boersma P (2001) Praat, a system for doing phonetics by computer. Glot Int 5(9/10):341–345

    Google Scholar 

  4. Callejas Z, López-Cózar R (2008) Influence of contextual information in emotion annotation for spoken dialogue systems. Speech Commun 50:416–433

    Article  Google Scholar 

  5. Campbell N (2007) On the use of nonverbal speech sounds in human communication. Cost 2012 workshop (Vietri), LNAI. Springer, Berlin, Heidelberg, pp 117–128

    Google Scholar 

  6. Caridakis G, Karpouzis K, Wallace M, Kessous L, Amir N (2010) Multimodal user’s affective state analysis in naturalistic interaction. J Multimodal User Interfaces 3(1):49–66

    Google Scholar 

  7. Cohn JF, Schmidt K (2004) The timing of facial motion in posed and spontaneous smiles. Int J Wavelets Multiresolut Inf Process 2(2):121–132

    Article  Google Scholar 

  8. Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor J (2001) Emotion recognition in human–computer interaction. IEEE Signal Process Mag 18(1):32–80

    Article  Google Scholar 

  9. Douglas-Cowie E, Devillers L, Martin JC, Cowie R, Savvidou S, Abrilian S, Cox C (2005) Multimodal databases of everyday emotion: facing up to complexity. In: Proceedings of Interspeech’05, pp 813–816

  10. Edlund J, Gustafson J, Heldner M, Hjalmarsson A (2008) Towards human-like spoken dialogue systems. Speech Commun 50(8):630–645

    Article  Google Scholar 

  11. Frommer J, Rösner D, Haase M, Lange J, Friesen R, Otto M (2012) Detection and avoidance of failures in dialogues—wizard of Oz experiment operator’s manual. Pabst Science Publishers, Germany

    Google Scholar 

  12. Fukuda S, Matsuura Y (1996) Understanding of emotional feelings in sound. Trans Jpn Soc Mech Eng Part C 62(598):2293–2298

    Article  Google Scholar 

  13. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18

    Article  Google Scholar 

  14. Jimenez-Fernandez A, Del Pozo F, Munoz C, Zoreda JL (1987) Pattern recognition in the vocal expression of emotional categories. In: Proceedings of the 25th annual Conference of the IEEE Engineering in Medicine and Biology Society, pp 2090–2091

  15. Jurafsky D, Martin JH (2009) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall, USA. http://www.cs.colorado.edu/%7Emartin/slp.html

  16. Kapoor A, Burleson W, Picard RW (2007) Automatic prediction of frustration. Int J Hum Comput Stud 65(8):724–736

    Article  Google Scholar 

  17. Krauss RM, Chen Y, Chawla P (1996) Nonverbal behavior and nonverbal communication: what do conversational hand gestures tell us? Adv Exp Soc Psychol 28:389–450

    Article  Google Scholar 

  18. Lee CM, Narayanan S (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13(2):293–303

    Article  Google Scholar 

  19. Prylipko D, Schuller B, Wendemuth A (2012) Fine-tuning HMMs for nonverbal vocalizations in spontaneous speech: a multicorpus perspective. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, pp 4625–4628

  20. Rösner D, Friesen R, Otto M, Lange J, Haase M, Frommer J (2011) Human–computer Interaction. Towards mobile and intelligent interaction environments. Intentionality in interacting with companion systems: an empirical approach. Springer, Berlin, pp 593–602

  21. Rösner D, Frommer J, Andrich R, Friesen R, Haase M, Kunze M, Lange J, Otto M (2012) LAST MINUTE: a novel corpus to support emotion, sentiment and social signal processing. In: Conference on Language Resources and Evaluation, LREC’12 Abstracts

  22. Rösner D, Kunze M, Otto M, Frommer J (2012) Linguistic analyses of the LAST MINUTE corpus. In: Proceedings of KONVENS’12, ÖGAI, pp 145–154

  23. Scherer KR, Ceschi G (1997) Lost luggage: a field study of emotion-antecedent appraisal. Motiv Emot 21:211–235

    Article  Google Scholar 

  24. Scherer S, Glodek M, Layher G, Schels M, Schmidt M, Brosch T, Tschechne S, Schwenker F, Neumann H, Palm G (2012) A generic framework for the inference of user states in human computer interaction. J Multimodal User Interfaces 6(3–4):117–141

    Article  Google Scholar 

  25. Schmidt T, Schütte W (2010) Folker: An annotation tool for efficient transcription of natural, multi-party interaction. In: Proceedings of LREC’10, pp 2091–2096

  26. Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun 53(9–10):1062–1087

    Google Scholar 

  27. Selting M, et al (2009) Gesprächsanalytisches Transkriptionssystem 2 (GAT 2)

  28. Siegert I, Böck R, Philippou-Hübner D, Vlasenko B, Wendemuth A (2011) Appropriate emotional labeling of non-acted speech using basic emotions, Geneva emotion wheel and self assessment ,anikins. In: Proceedings of ICME’11

  29. Suwa M, Sugie N, Fujimora K (1978) A preliminary note on pattern recognition of human emotional expression. In: Proceedings of the IEEE International Conference on Pattern Recognition, pp 408–410

  30. Vlasenko B, Prylipko D, Philippou-Hübner D, Wendemuth A (2011) Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions. In: Proceedings of Interspeech’11, pp 1577–1580

  31. Vlasenko B, Prylipko D, Böck R, Wendemuth A (2014) Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications. Comput Speech Lang (Article in press)

  32. Walker M, Langkilde I, Wright J, Gorin A, Litman D (2000) Learning to predict problematic situations in a spoken dialogue system: experiments with how may I help you? In: Proceedings of NAACL’00, pp 210–217

  33. Wilks Y (2010) Close engagements with artificial companions: key social, psychological, ethical and design issues. John Benjamins, Amsterdam

    Book  Google Scholar 

  34. Williams CE, Stevens KN (1972) Emotions and speech: some acoustical correlates. J Acoust Soc Am 52(4B):1238–1250

    Article  Google Scholar 

  35. Wöllmer M, Eyben F, Reiter S, Schuller B, Cox C, Douglas-Cowie E, Cowie R (2008) Abandoning emotion classes—towards continuous emotion recognition with modelling of long-range dependencies. In: Proceedings of Interspeech’08, pp 597–600

  36. Wolters M, Georgila K, Moore JD, MacPherson SE (2009) Being old doesn’t mean acting old: how older users interact with spoken dialog systems. ACM Trans Access Comput 2(1):1–39

    Google Scholar 

  37. Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2006) The HTK book (for HTK Version 3.4). Cambridge University Press, Cambridge

    Google Scholar 

  38. Zeng Z, Tu J, Liu M, Huang T, Pianfetti B, Roth D, Levinson S (2007) Audio-visual affect recognition. IEEE Trans Multimed 9(2):424–428

    Article  Google Scholar 

  39. Zeng Z, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58

    Article  Google Scholar 

Download references

Acknowledgments

The study is performed in the framework of the   Transregional   Collaborative   Research   Centre SFB/TRR 62 “A Companion-Technology for Cognitive Technical Systems”   funded by the German Research Foundation (DFG). Responsibility for the content lies with the authors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dmytro Prylipko.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Prylipko, D., Rösner, D., Siegert, I. et al. Analysis of significant dialog events in realistic human–computer interaction. J Multimodal User Interfaces 8, 75–86 (2014). https://doi.org/10.1007/s12193-013-0144-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12193-013-0144-x

Keywords

Navigation