Abstract
Engagement represents how much a user is interested in and willing to continue the current dialogue and is the important cue for spoken dialogue systems to adapt the user state. We address engagement recognition based on listener’s multimodal behaviors such as backchannels, laughing, head nodding, and eye gaze. When the ground-truth labels are given by multiple annotators, they differ according to each annotator due to the different perspectives on the multimodal behaviors. We assume that each annotator has a latent character that affects its perception of engagement. We propose a hierarchical Bayesian model that estimates both the engagement level and the character of each annotator as latent variables. Furthermore, we incorporate other latent variables to map the input feature into a sub-space. The experimental result shows that the proposed model achieves higher accuracy than other models that do not take into account the character.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bednarik R, Eivazi S, Hradis M (2012) Gaze and conversational engagement in multiparty video conversation: an annotation scheme and classification of high and low levels of engagement. In: Proceedings of the ICMI workshop on eye gaze in intelligent human machine interaction
Bohus D, Horvitz E (2009) Learning to predict engagement with a spoken dialog system in open-world settings. In: Proceedings of the SIGDIAL, pp 244–252
Castellano G, Pereira A, Leite I, Paiva A, McOwan PW (2009) Detecting user engagement with a robot companion using task and social interaction-based features. In: Proceedings of the ICMI, pp 119–126
Chiba Y, Ito A (2016) Estimation of users willingness to talk about the topic: analysis of interviews between humans. In: Proceedings of the IWSDS
Den Y, Yoshida N, Takanashi K, Koiso H (2011) Annotation of Japanese response tokens and preliminary analysis on their distribution in three-party conversations. In: Proceedings of the oriental COCOSDA, pp 168–173
DeVault D, Artstein R, Benn G, Dey T, Fast E, Gainer A, Georgila K, Gratch J, Hartholt A, Lhommet M, Lucas G, Marsella S, Morbini F, Nazarian A, Scherer S, Stratou G, Suri A, Traum D, Wood R, Xu Y, Rizzo A, Morency LP (2014) SimSensei kiosk: a virtual human interviewer for healthcare decision support. In: Proceedings of the autonomous agents and multi-agent systems, pp 1061–1068
Frank M, Tofighi G, Gu H, Fruchter R (2016) Engagement detection in meetings. arXiv preprint arXiv:1608.08711
Glas N, Pelachaud C (2015) Definitions of engagement in human-agent interaction. In: Proceedings of the international workshop on engagement in human computer interaction, pp 944–949
Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(suppl 1):5228–5235
Higashinaka R, Imamura K, Meguro T, Miyazaki C, Kobayashi N, Sugiyama H, Hirano T, Makino T, Matsuo Y (2014) Towards an open-domain conversational system fully based on natural language processing. In: Proceedings of the COLING, pp 928–939
Huang Y, Gilmartin E, Campbell N (2016) Conversational engagement recognition using auditory and visual cues. In: Proceedings of the INTERSPEECH
Inoue K, Lala D, Nakamura S, Takanashi K, Kawahara T (2016) Annotation and analysis of listener’s engagement based on multi-modal behaviors. In: Proceedings of the ICMI workshop on multimodal analyses enabling artificial agents in human-machine interaction
Inoue K, Milhorat P, Lala D, Zhao T, Kawahara T (2016) Talking with ERICA, an autonomous android. In: Proceedings of the SIGDIAL, pp 212–215
Michalowski MP, Sabanovic S, Simmons R (2006) A spatial model of engagement for a social robot. In: Proceedings of the international workshop on advanced motion control, pp 762–767
Nakano YI, Ishii R (2010) Estimating user’s engagement from eye-gaze behaviors in human-agent conversations. In: Proceedings of the IUI, pp 139–148
Oertel C, Mora KAF, Gustafson J, Odobez JM (2015) Deciphering the silent participant: on the use of audio-visual cues for the classification of listener categories in group discussions. In: Proceedings of the ICMI
Ozkan D, Morency LP (2011) Modeling wisdom of crowds using latent mixture of discriminative experts. In: Proceedings of the ACL, pp 335–340
Ozkan D, Sagae K, Morency LP (2010) Latent mixture of discriminative experts for multimodal prediction modeling. In: Proceedings of the COLING, pp 860–868
Peters C (2005) Direction of attention perception for conversation initiation in virtual environments. In: Proceedings of the international workshop on intelligent virtual agents, pp 215–228
Poggi I (2007) Mind, hands, face and body: a goal and belief view of multimodal communication. Weidler
Ramanarayanan V, Leong CW, Suendermann-Oeft D (2017) Rushing to judgement: how do laypeople rate caller engagement in thin-slice videos of human-machine dialog?. In: INTERSPEECH, pp 2526–2530
Rich C, Ponsler B, Holroyd A, Sidner CL (2010) Recognizing engagement in human-robot interaction. In: Proceedings of the HRI, pp 375–382
Sanghvi J, Castellano G, Leite I, Pereira A, McOwan PW, Paiva A (2011) Automatic analysis of affective postures and body motion to detect engagement with a game companion. In: Proceedings of the HRI, pp 305–311
Sidner CL, Lee C (2003) Engagement rules for human-robot collaborative interactions. In: Proceedings of the ICSMC, pp 3957–3962
Sidner CL, Lee C, Kidd CD, Lesh N, Rich C (2005) Explorations in engagement for humans and robots. Artif Intell 166(1–2):140–164
Sun M, Zhao Z, Ma X (2017) Sensing and handling engagement dynamics in human-robot interaction involving peripheral computing devices. In: CHI, pp 556–567
Türker, B.B., Buçinca Z, Erzin E, Yemez Y, Sezgin M (2017) Analysis of engagement and user experience with a laughter responsive social robot. In: INTERSPEECH, pp 844–848
Xu Q, Li L, Wang G (2013) Designing engagement-aware agents for multiparty conversations. In: Proceedings of the CHI, pp 2233–2242
Yu C, Aoki PM, Woodruff A (2004) Detecting user engagement in everyday conversations. In: Proceedings of the ICSLP, pp 1329–1332
Yu Z, Nicolich-Henkin L, Black AW, Rudnicky AI (2016) A Wizard-of-Oz study on a non-task-oriented dialog systems that reacts to user engagement. In: Proceedings of the SIGDIAL, pp 55–63
Yu Z, Ramanarayanan V, Lange P, Suendermann-Oeft D (2017) An open-source dialog system with real-time engagement tracking for job interview training applications. In: Proceedings of the IWSDS
Acknowledgements
This work was supported by JSPS KAKENHI (Grant Number 15J07337) and JST ERATO Ishiguro Symbiotic Human-Robot Interaction program (Grant Number JPMJER1401), Japan.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Inoue, K., Lala, D., Takanashi, K., Kawahara, T. (2019). Latent Character Model for Engagement Recognition Based on Multimodal Behaviors. In: D'Haro, L., Banchs, R., Li, H. (eds) 9th International Workshop on Spoken Dialogue System Technology. Lecture Notes in Electrical Engineering, vol 579. Springer, Singapore. https://doi.org/10.1007/978-981-13-9443-0_11
Download citation
DOI: https://doi.org/10.1007/978-981-13-9443-0_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9442-3
Online ISBN: 978-981-13-9443-0
eBook Packages: Literature, Cultural and Media StudiesLiterature, Cultural and Media Studies (R0)