Analysis of significant dialog events in realistic human–computer interaction

Prylipko, Dmytro; Rösner, Dietmar; Siegert, Ingo; Günther, Stephan; Friesen, Rafael; Haase, Matthias; Vlasenko, Bogdan; Wendemuth, Andreas

doi:10.1007/s12193-013-0144-x

Analysis of significant dialog events in realistic human–computer interaction

Original Paper
Published: 28 December 2013

Volume 8, pages 75–86, (2014)
Cite this article

Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Dmytro Prylipko¹,
Dietmar Rösner²,
Ingo Siegert¹,
Stephan Günther²,
Rafael Friesen²,
Matthias Haase³,
Bogdan Vlasenko¹ &
…
Andreas Wendemuth¹

489 Accesses
15 Citations
Explore all metrics

Abstract

This paper addresses issues of automatically detecting significant dialog events (SDEs) in naturalistic HCI, and of deducing trait-specific conclusions relevant for the design of spoken dialog systems. We perform our investigations on the multimodal LAST MINUTE corpus with records from naturalistic interactions. First, we used textual transcripts to analyse interaction styles and discourse structures. We found indications that younger subjects prefer a more technical style in communication with dialog systems. Next, we model the subject’s internal success state with a hidden Markov model trained using the observed sequences of system feedback. This reveals that younger subjects interact significantly more successful with technical systems. Aiming on automatic detection of specific subjects’s reactions, we then semi-automatically annotate SDEs—phrases indicating an irregular, i.e. not-task-oriented subject behavior. We use both acoustic and linguistic features to build several trait-specific classifiers for dialog phases, which showed pronouncedly different accuracies for diverse age and gender groups. The presented investigations coherently support age-dependence of both expressiveness and problem-solving ability. This in turn induces design rules for future automatic designated “companion” systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AI-based chatbots in customer service and their effects on user compliance

Article Open access 17 March 2020

An Overview of Chatbot Technology

Natural Language Processing

Notes

http://mary.dfki.de/.
Here and further ‘W’ stands for wizard and ‘S’ for subject. The subject’s code is given once at his first phrase. Transcripts are given with the GAT 2 minimal coding. English glosses added as a convenience for the reader.
http://jahmm.googlecode.com/.
The following features are calculated as ratio of corresponding items to the number of words.

References

Batliner A, Fischer K, Huber R, Spilker J, Nöth E (2003) How to find trouble in communication. Speech Commun 40(1–2):117–143
Google Scholar
Batliner A, Steidl S, Schuller B, Seppi D, Vogt T, Wagner J, Devillers L, Vidrascu L, Aharonson V, Kessous L (2011) Whodunnit—searching for the most important feature types signalling emotion-related user states in speech. Comput Speech Lang 25(1):4–28
Article Google Scholar
Boersma P (2001) Praat, a system for doing phonetics by computer. Glot Int 5(9/10):341–345
Google Scholar
Callejas Z, López-Cózar R (2008) Influence of contextual information in emotion annotation for spoken dialogue systems. Speech Commun 50:416–433
Article Google Scholar
Campbell N (2007) On the use of nonverbal speech sounds in human communication. Cost 2012 workshop (Vietri), LNAI. Springer, Berlin, Heidelberg, pp 117–128
Google Scholar
Caridakis G, Karpouzis K, Wallace M, Kessous L, Amir N (2010) Multimodal user’s affective state analysis in naturalistic interaction. J Multimodal User Interfaces 3(1):49–66
Google Scholar
Cohn JF, Schmidt K (2004) The timing of facial motion in posed and spontaneous smiles. Int J Wavelets Multiresolut Inf Process 2(2):121–132
Article Google Scholar
Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor J (2001) Emotion recognition in human–computer interaction. IEEE Signal Process Mag 18(1):32–80
Article Google Scholar
Douglas-Cowie E, Devillers L, Martin JC, Cowie R, Savvidou S, Abrilian S, Cox C (2005) Multimodal databases of everyday emotion: facing up to complexity. In: Proceedings of Interspeech’05, pp 813–816
Edlund J, Gustafson J, Heldner M, Hjalmarsson A (2008) Towards human-like spoken dialogue systems. Speech Commun 50(8):630–645
Article Google Scholar
Frommer J, Rösner D, Haase M, Lange J, Friesen R, Otto M (2012) Detection and avoidance of failures in dialogues—wizard of Oz experiment operator’s manual. Pabst Science Publishers, Germany
Google Scholar
Fukuda S, Matsuura Y (1996) Understanding of emotional feelings in sound. Trans Jpn Soc Mech Eng Part C 62(598):2293–2298
Article Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18
Article Google Scholar
Jimenez-Fernandez A, Del Pozo F, Munoz C, Zoreda JL (1987) Pattern recognition in the vocal expression of emotional categories. In: Proceedings of the 25th annual Conference of the IEEE Engineering in Medicine and Biology Society, pp 2090–2091
Jurafsky D, Martin JH (2009) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall, USA. http://www.cs.colorado.edu/%7Emartin/slp.html
Kapoor A, Burleson W, Picard RW (2007) Automatic prediction of frustration. Int J Hum Comput Stud 65(8):724–736
Article Google Scholar
Krauss RM, Chen Y, Chawla P (1996) Nonverbal behavior and nonverbal communication: what do conversational hand gestures tell us? Adv Exp Soc Psychol 28:389–450
Article Google Scholar
Lee CM, Narayanan S (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13(2):293–303
Article Google Scholar
Prylipko D, Schuller B, Wendemuth A (2012) Fine-tuning HMMs for nonverbal vocalizations in spontaneous speech: a multicorpus perspective. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, pp 4625–4628
Rösner D, Friesen R, Otto M, Lange J, Haase M, Frommer J (2011) Human–computer Interaction. Towards mobile and intelligent interaction environments. Intentionality in interacting with companion systems: an empirical approach. Springer, Berlin, pp 593–602
Rösner D, Frommer J, Andrich R, Friesen R, Haase M, Kunze M, Lange J, Otto M (2012) LAST MINUTE: a novel corpus to support emotion, sentiment and social signal processing. In: Conference on Language Resources and Evaluation, LREC’12 Abstracts
Rösner D, Kunze M, Otto M, Frommer J (2012) Linguistic analyses of the LAST MINUTE corpus. In: Proceedings of KONVENS’12, ÖGAI, pp 145–154
Scherer KR, Ceschi G (1997) Lost luggage: a field study of emotion-antecedent appraisal. Motiv Emot 21:211–235
Article Google Scholar
Scherer S, Glodek M, Layher G, Schels M, Schmidt M, Brosch T, Tschechne S, Schwenker F, Neumann H, Palm G (2012) A generic framework for the inference of user states in human computer interaction. J Multimodal User Interfaces 6(3–4):117–141
Article Google Scholar
Schmidt T, Schütte W (2010) Folker: An annotation tool for efficient transcription of natural, multi-party interaction. In: Proceedings of LREC’10, pp 2091–2096
Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun 53(9–10):1062–1087
Google Scholar
Selting M, et al (2009) Gesprächsanalytisches Transkriptionssystem 2 (GAT 2)
Siegert I, Böck R, Philippou-Hübner D, Vlasenko B, Wendemuth A (2011) Appropriate emotional labeling of non-acted speech using basic emotions, Geneva emotion wheel and self assessment ,anikins. In: Proceedings of ICME’11
Suwa M, Sugie N, Fujimora K (1978) A preliminary note on pattern recognition of human emotional expression. In: Proceedings of the IEEE International Conference on Pattern Recognition, pp 408–410
Vlasenko B, Prylipko D, Philippou-Hübner D, Wendemuth A (2011) Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions. In: Proceedings of Interspeech’11, pp 1577–1580
Vlasenko B, Prylipko D, Böck R, Wendemuth A (2014) Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications. Comput Speech Lang (Article in press)
Walker M, Langkilde I, Wright J, Gorin A, Litman D (2000) Learning to predict problematic situations in a spoken dialogue system: experiments with how may I help you? In: Proceedings of NAACL’00, pp 210–217
Wilks Y (2010) Close engagements with artificial companions: key social, psychological, ethical and design issues. John Benjamins, Amsterdam
Book Google Scholar
Williams CE, Stevens KN (1972) Emotions and speech: some acoustical correlates. J Acoust Soc Am 52(4B):1238–1250
Article Google Scholar
Wöllmer M, Eyben F, Reiter S, Schuller B, Cox C, Douglas-Cowie E, Cowie R (2008) Abandoning emotion classes—towards continuous emotion recognition with modelling of long-range dependencies. In: Proceedings of Interspeech’08, pp 597–600
Wolters M, Georgila K, Moore JD, MacPherson SE (2009) Being old doesn’t mean acting old: how older users interact with spoken dialog systems. ACM Trans Access Comput 2(1):1–39
Google Scholar
Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2006) The HTK book (for HTK Version 3.4). Cambridge University Press, Cambridge
Google Scholar
Zeng Z, Tu J, Liu M, Huang T, Pianfetti B, Roth D, Levinson S (2007) Audio-visual affect recognition. IEEE Trans Multimed 9(2):424–428
Article Google Scholar
Zeng Z, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58
Article Google Scholar

Download references

Acknowledgments

The study is performed in the framework of the Transregional Collaborative Research Centre SFB/TRR 62 “A Companion-Technology for Cognitive Technical Systems” funded by the German Research Foundation (DFG). Responsibility for the content lies with the authors.

Author information

Authors and Affiliations

IIKT & CBBS, Otto-von-Guericke University Magdeburg, 39106 , Magdeburg, Germany
Dmytro Prylipko, Ingo Siegert, Bogdan Vlasenko & Andreas Wendemuth
IWS & CBBS, Otto-von-Guericke-University Magdeburg, 39106 , Magdeburg, Germany
Dietmar Rösner, Stephan Günther & Rafael Friesen
Department of Psychosomatic Medicine and Psychotherapy, Otto-von-Guericke University, 39106 , Magdeburg, Germany
Matthias Haase

Authors

Dmytro Prylipko
View author publications
You can also search for this author in PubMed Google Scholar
Dietmar Rösner
View author publications
You can also search for this author in PubMed Google Scholar
Ingo Siegert
View author publications
You can also search for this author in PubMed Google Scholar
Stephan Günther
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Friesen
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Haase
View author publications
You can also search for this author in PubMed Google Scholar
Bogdan Vlasenko
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Wendemuth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dmytro Prylipko.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Prylipko, D., Rösner, D., Siegert, I. et al. Analysis of significant dialog events in realistic human–computer interaction. J Multimodal User Interfaces 8, 75–86 (2014). https://doi.org/10.1007/s12193-013-0144-x

Download citation

Received: 07 April 2013
Accepted: 11 December 2013
Published: 28 December 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s12193-013-0144-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis of significant dialog events in realistic human–computer interaction

Abstract

Access this article

Similar content being viewed by others

AI-based chatbots in customer service and their effects on user compliance

An Overview of Chatbot Technology

Natural Language Processing

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Analysis of significant dialog events in realistic human–computer interaction

Abstract

Access this article

Similar content being viewed by others

AI-based chatbots in customer service and their effects on user compliance

An Overview of Chatbot Technology

Natural Language Processing

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation