Skip to main content

Latent Character Model for Engagement Recognition Based on Multimodal Behaviors

  • Conference paper
  • First Online:
9th International Workshop on Spoken Dialogue System Technology

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 579))

Abstract

Engagement represents how much a user is interested in and willing to continue the current dialogue and is the important cue for spoken dialogue systems to adapt the user state. We address engagement recognition based on listener’s multimodal behaviors such as backchannels, laughing, head nodding, and eye gaze. When the ground-truth labels are given by multiple annotators, they differ according to each annotator due to the different perspectives on the multimodal behaviors. We assume that each annotator has a latent character that affects its perception of engagement. We propose a hierarchical Bayesian model that estimates both the engagement level and the character of each annotator as latent variables. Furthermore, we incorporate other latent variables to map the input feature into a sub-space. The experimental result shows that the proposed model achieves higher accuracy than other models that do not take into account the character.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bednarik R, Eivazi S, Hradis M (2012) Gaze and conversational engagement in multiparty video conversation: an annotation scheme and classification of high and low levels of engagement. In: Proceedings of the ICMI workshop on eye gaze in intelligent human machine interaction

    Google Scholar 

  2. Bohus D, Horvitz E (2009) Learning to predict engagement with a spoken dialog system in open-world settings. In: Proceedings of the SIGDIAL, pp 244–252

    Google Scholar 

  3. Castellano G, Pereira A, Leite I, Paiva A, McOwan PW (2009) Detecting user engagement with a robot companion using task and social interaction-based features. In: Proceedings of the ICMI, pp 119–126

    Google Scholar 

  4. Chiba Y, Ito A (2016) Estimation of users willingness to talk about the topic: analysis of interviews between humans. In: Proceedings of the IWSDS

    Google Scholar 

  5. Den Y, Yoshida N, Takanashi K, Koiso H (2011) Annotation of Japanese response tokens and preliminary analysis on their distribution in three-party conversations. In: Proceedings of the oriental COCOSDA, pp 168–173

    Google Scholar 

  6. DeVault D, Artstein R, Benn G, Dey T, Fast E, Gainer A, Georgila K, Gratch J, Hartholt A, Lhommet M, Lucas G, Marsella S, Morbini F, Nazarian A, Scherer S, Stratou G, Suri A, Traum D, Wood R, Xu Y, Rizzo A, Morency LP (2014) SimSensei kiosk: a virtual human interviewer for healthcare decision support. In: Proceedings of the autonomous agents and multi-agent systems, pp 1061–1068

    Google Scholar 

  7. Frank M, Tofighi G, Gu H, Fruchter R (2016) Engagement detection in meetings. arXiv preprint arXiv:1608.08711

  8. Glas N, Pelachaud C (2015) Definitions of engagement in human-agent interaction. In: Proceedings of the international workshop on engagement in human computer interaction, pp 944–949

    Google Scholar 

  9. Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(suppl 1):5228–5235

    Article  Google Scholar 

  10. Higashinaka R, Imamura K, Meguro T, Miyazaki C, Kobayashi N, Sugiyama H, Hirano T, Makino T, Matsuo Y (2014) Towards an open-domain conversational system fully based on natural language processing. In: Proceedings of the COLING, pp 928–939

    Google Scholar 

  11. Huang Y, Gilmartin E, Campbell N (2016) Conversational engagement recognition using auditory and visual cues. In: Proceedings of the INTERSPEECH

    Google Scholar 

  12. Inoue K, Lala D, Nakamura S, Takanashi K, Kawahara T (2016) Annotation and analysis of listener’s engagement based on multi-modal behaviors. In: Proceedings of the ICMI workshop on multimodal analyses enabling artificial agents in human-machine interaction

    Google Scholar 

  13. Inoue K, Milhorat P, Lala D, Zhao T, Kawahara T (2016) Talking with ERICA, an autonomous android. In: Proceedings of the SIGDIAL, pp 212–215

    Google Scholar 

  14. Michalowski MP, Sabanovic S, Simmons R (2006) A spatial model of engagement for a social robot. In: Proceedings of the international workshop on advanced motion control, pp 762–767

    Google Scholar 

  15. Nakano YI, Ishii R (2010) Estimating user’s engagement from eye-gaze behaviors in human-agent conversations. In: Proceedings of the IUI, pp 139–148

    Google Scholar 

  16. Oertel C, Mora KAF, Gustafson J, Odobez JM (2015) Deciphering the silent participant: on the use of audio-visual cues for the classification of listener categories in group discussions. In: Proceedings of the ICMI

    Google Scholar 

  17. Ozkan D, Morency LP (2011) Modeling wisdom of crowds using latent mixture of discriminative experts. In: Proceedings of the ACL, pp 335–340

    Google Scholar 

  18. Ozkan D, Sagae K, Morency LP (2010) Latent mixture of discriminative experts for multimodal prediction modeling. In: Proceedings of the COLING, pp 860–868

    Google Scholar 

  19. Peters C (2005) Direction of attention perception for conversation initiation in virtual environments. In: Proceedings of the international workshop on intelligent virtual agents, pp 215–228

    Chapter  Google Scholar 

  20. Poggi I (2007) Mind, hands, face and body: a goal and belief view of multimodal communication. Weidler

    Google Scholar 

  21. Ramanarayanan V, Leong CW, Suendermann-Oeft D (2017) Rushing to judgement: how do laypeople rate caller engagement in thin-slice videos of human-machine dialog?. In: INTERSPEECH, pp 2526–2530

    Google Scholar 

  22. Rich C, Ponsler B, Holroyd A, Sidner CL (2010) Recognizing engagement in human-robot interaction. In: Proceedings of the HRI, pp 375–382

    Google Scholar 

  23. Sanghvi J, Castellano G, Leite I, Pereira A, McOwan PW, Paiva A (2011) Automatic analysis of affective postures and body motion to detect engagement with a game companion. In: Proceedings of the HRI, pp 305–311

    Google Scholar 

  24. Sidner CL, Lee C (2003) Engagement rules for human-robot collaborative interactions. In: Proceedings of the ICSMC, pp 3957–3962

    Google Scholar 

  25. Sidner CL, Lee C, Kidd CD, Lesh N, Rich C (2005) Explorations in engagement for humans and robots. Artif Intell 166(1–2):140–164

    Article  Google Scholar 

  26. Sun M, Zhao Z, Ma X (2017) Sensing and handling engagement dynamics in human-robot interaction involving peripheral computing devices. In: CHI, pp 556–567

    Google Scholar 

  27. Türker, B.B., Buçinca Z, Erzin E, Yemez Y, Sezgin M (2017) Analysis of engagement and user experience with a laughter responsive social robot. In: INTERSPEECH, pp 844–848

    Google Scholar 

  28. Xu Q, Li L, Wang G (2013) Designing engagement-aware agents for multiparty conversations. In: Proceedings of the CHI, pp 2233–2242

    Google Scholar 

  29. Yu C, Aoki PM, Woodruff A (2004) Detecting user engagement in everyday conversations. In: Proceedings of the ICSLP, pp 1329–1332

    Google Scholar 

  30. Yu Z, Nicolich-Henkin L, Black AW, Rudnicky AI (2016) A Wizard-of-Oz study on a non-task-oriented dialog systems that reacts to user engagement. In: Proceedings of the SIGDIAL, pp 55–63

    Google Scholar 

  31. Yu Z, Ramanarayanan V, Lange P, Suendermann-Oeft D (2017) An open-source dialog system with real-time engagement tracking for job interview training applications. In: Proceedings of the IWSDS

    Google Scholar 

Download references

Acknowledgements

This work was supported by JSPS KAKENHI (Grant Number 15J07337) and JST ERATO Ishiguro Symbiotic Human-Robot Interaction program (Grant Number JPMJER1401), Japan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Koji Inoue .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Inoue, K., Lala, D., Takanashi, K., Kawahara, T. (2019). Latent Character Model for Engagement Recognition Based on Multimodal Behaviors. In: D'Haro, L., Banchs, R., Li, H. (eds) 9th International Workshop on Spoken Dialogue System Technology. Lecture Notes in Electrical Engineering, vol 579. Springer, Singapore. https://doi.org/10.1007/978-981-13-9443-0_11

Download citation

Publish with us

Policies and ethics