Abstract
Eye gaze and gesture form key conversational grounding cues that are used extensively in face-to-face interaction among people. To accurately recognize visual feedback during interaction, people often use contextual knowledge from previous and current events to anticipate when feedback is most likely to occur. In this paper, we investigate how dialog context from an embodied conversational agent (ECA) can improve visual recognition of eye gestures. We propose a new framework for contextual recognition based on Latent-Dynamic Conditional Random Field (LDCRF) models to learn the sub-structure and external dynamics of contextual cues. Our experiments show that adding contextual information improves visual recognition of eye gestures and demonstrate that the LDCRF model for context-based recognition of gaze aversion gestures outperforms Support Vector Machines, Hidden Markov Models, and Conditional Random Fields.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Morency, L.P., Christoudias, C.M., Darrell, T.: Recognizing gaze aversion gestures in embodied conversational discourse. In: Proceedings of the International Conference on Multi-modal Interfaces, Banff, Canada (2006)
Morency, L.P., Quattoni, A., Darrell, T.: Latent-dynamic discriminative models for continuous gesture recognition. Technical Report MIT-CSAIL-TR-2007-002, MIT CSAIL (2007)
Kendon, A.: Some functions of gaze direction in social interaction. Acta Psyghologica 26, 22–63 (1967)
Traum, D., Rickel, J.: Embodied agents for multi-party dialogue in immersive virtual worlds. In: Alonso, E., Kudenko, D., Kazakov, D. (eds.) Embodied agents for multi-party dialogue in immersive virtual worlds. LNCS (LNAI), vol. 2636, pp. 766–773. Springer, Heidelberg (2003)
Vertegaal, R., Slagter, R., van der Veer, G., Nijholt, A.: Eye gaze patterns in conversations: There is more to conversational agents than meets the eyes. In: CHI 2001. Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 301–308 (2001)
Fukayama, A., Ohno, T., Mukawa, N., Sawaki, M., Hagita, N.: Messages embedded in gaze of interface agents — impression management with agent’s gaze. In: CHI 2002. Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 41–48 (2002)
Velichkovsky, B.M., Hansen, J.P.: New technological windows in mind: There is more in eyes and brains for human-computer interaction. In: CHI 1996. Proceedings of the SIGCHI conference on Human factors in computing systems (1996)
Qvarfordt, P., Zhai, S.: Conversing with the user based on eye-gaze patterns. In: CHI 2005. Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 221–230 (2005)
Li, M., Selker, T.: Eye pattern analysis in intelligent virtual agents. In: (IVA 2002). Conference on Intelligent Virutal Agents, pp. 23–35 (2001)
Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based vision system for place and object recognition. In: IEEE Intl. Conference on Computer Vision (ICCV), Nice, France (2003)
Fujie, S., Ejiri, Y., Nakajima, K., Matsusaka, Y., Kobayashi, T.: A conversation robot using head gesture recognition as para-linguistic information. In: Proceedings of 13th IEEE International Workshop on Robot and Human Communication, RO-MAN 2004, pp. 159–164 (2004)
Morency, L.-P., Sidner, C., Lee, C., Darrell, T.: Contextual recognition of head gestures. In: Proceedings of the International Conference on Multi-modal Interfaces (2005)
Morency, L.-P., Darrell, T.: Head gesture recognition in intelligent interfaces: The role of context in improving recognition. In: Proceedings of Intelligent User Interfaces, Australia (2006)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labelling sequence data. In: ICML (2001)
Quattoni, A., Collins, M., Darrell, T.: Conditional random fields for object recognition. In: NIPS (2004)
Gunawardana, A., Mahajan, M., Acero, A., Platt, J.C.: Hidden conditional random fields for phone classification. In: INTERSPEECH (2005)
Wang, S., Quattoni, A., Morency, L., Demirdjian, D., Darrell, T.: Hidden conditional random fields for gesture recognition. In: CVPR (2006)
Nakano, Reinstein, Stocky, Cassell, J.: Towards a model of face-to-face grounding. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan (2003)
Rich, N., Sidner, Lesh: Collagen: Applying collaborative discourse theory to human–computer interaction. AI Magazine, Special Issue on Intelligent User Interfaces 22(4), 15–25 (2001)
Kumar, S., Herbert., M.: Discriminative random fields: A framework for contextual interaction in classification. In: ICCV (2003)
Vapnik, V.: The nature of statistical learning theory. Springer, Heidelberg (1995)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Morency, LP., Darrell, T. (2008). Conditional Sequence Model for Context-Based Recognition of Gaze Aversion. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2007. Lecture Notes in Computer Science, vol 4892. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78155-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-78155-4_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78154-7
Online ISBN: 978-3-540-78155-4
eBook Packages: Computer ScienceComputer Science (R0)