Conditional Sequence Model for Context-Based Recognition of Gaze Aversion

Morency, Louis-Philippe; Darrell, Trevor

doi:10.1007/978-3-540-78155-4_2

Louis-Philippe Morency¹ &
Trevor Darrell¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4892))

Included in the following conference series:

International Workshop on Machine Learning for Multimodal Interaction

1039 Accesses
4 Citations

Abstract

Eye gaze and gesture form key conversational grounding cues that are used extensively in face-to-face interaction among people. To accurately recognize visual feedback during interaction, people often use contextual knowledge from previous and current events to anticipate when feedback is most likely to occur. In this paper, we investigate how dialog context from an embodied conversational agent (ECA) can improve visual recognition of eye gestures. We propose a new framework for contextual recognition based on Latent-Dynamic Conditional Random Field (LDCRF) models to learn the sub-structure and external dynamics of contextual cues. Our experiments show that adding contextual information improves visual recognition of eye gestures and demonstrate that the LDCRF model for context-based recognition of gaze aversion gestures outperforms Support Vector Machines, Hidden Markov Models, and Conditional Random Fields.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Morency, L.P., Christoudias, C.M., Darrell, T.: Recognizing gaze aversion gestures in embodied conversational discourse. In: Proceedings of the International Conference on Multi-modal Interfaces, Banff, Canada (2006)
Google Scholar
Morency, L.P., Quattoni, A., Darrell, T.: Latent-dynamic discriminative models for continuous gesture recognition. Technical Report MIT-CSAIL-TR-2007-002, MIT CSAIL (2007)
Google Scholar
Kendon, A.: Some functions of gaze direction in social interaction. Acta Psyghologica 26, 22–63 (1967)
Article Google Scholar
Traum, D., Rickel, J.: Embodied agents for multi-party dialogue in immersive virtual worlds. In: Alonso, E., Kudenko, D., Kazakov, D. (eds.) Embodied agents for multi-party dialogue in immersive virtual worlds. LNCS (LNAI), vol. 2636, pp. 766–773. Springer, Heidelberg (2003)
Google Scholar
Vertegaal, R., Slagter, R., van der Veer, G., Nijholt, A.: Eye gaze patterns in conversations: There is more to conversational agents than meets the eyes. In: CHI 2001. Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 301–308 (2001)
Google Scholar
Fukayama, A., Ohno, T., Mukawa, N., Sawaki, M., Hagita, N.: Messages embedded in gaze of interface agents — impression management with agent’s gaze. In: CHI 2002. Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 41–48 (2002)
Google Scholar
Velichkovsky, B.M., Hansen, J.P.: New technological windows in mind: There is more in eyes and brains for human-computer interaction. In: CHI 1996. Proceedings of the SIGCHI conference on Human factors in computing systems (1996)
Google Scholar
Qvarfordt, P., Zhai, S.: Conversing with the user based on eye-gaze patterns. In: CHI 2005. Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 221–230 (2005)
Google Scholar
Li, M., Selker, T.: Eye pattern analysis in intelligent virtual agents. In: (IVA 2002). Conference on Intelligent Virutal Agents, pp. 23–35 (2001)
Google Scholar
Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based vision system for place and object recognition. In: IEEE Intl. Conference on Computer Vision (ICCV), Nice, France (2003)
Google Scholar
Fujie, S., Ejiri, Y., Nakajima, K., Matsusaka, Y., Kobayashi, T.: A conversation robot using head gesture recognition as para-linguistic information. In: Proceedings of 13th IEEE International Workshop on Robot and Human Communication, RO-MAN 2004, pp. 159–164 (2004)
Google Scholar
Morency, L.-P., Sidner, C., Lee, C., Darrell, T.: Contextual recognition of head gestures. In: Proceedings of the International Conference on Multi-modal Interfaces (2005)
Google Scholar
Morency, L.-P., Darrell, T.: Head gesture recognition in intelligent interfaces: The role of context in improving recognition. In: Proceedings of Intelligent User Interfaces, Australia (2006)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labelling sequence data. In: ICML (2001)
Google Scholar
Quattoni, A., Collins, M., Darrell, T.: Conditional random fields for object recognition. In: NIPS (2004)
Google Scholar
Gunawardana, A., Mahajan, M., Acero, A., Platt, J.C.: Hidden conditional random fields for phone classification. In: INTERSPEECH (2005)
Google Scholar
Wang, S., Quattoni, A., Morency, L., Demirdjian, D., Darrell, T.: Hidden conditional random fields for gesture recognition. In: CVPR (2006)
Google Scholar
Nakano, Reinstein, Stocky, Cassell, J.: Towards a model of face-to-face grounding. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan (2003)
Google Scholar
Rich, N., Sidner, Lesh: Collagen: Applying collaborative discourse theory to human–computer interaction. AI Magazine, Special Issue on Intelligent User Interfaces 22(4), 15–25 (2001)
Google Scholar
Kumar, S., Herbert., M.: Discriminative random fields: A framework for contextual interaction in classification. In: ICCV (2003)
Google Scholar
Vapnik, V.: The nature of statistical learning theory. Springer, Heidelberg (1995)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA 02139,
Louis-Philippe Morency & Trevor Darrell

Authors

Louis-Philippe Morency
View author publications
You can also search for this author in PubMed Google Scholar
Trevor Darrell
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Andrei Popescu-Belis Steve Renals Hervé Bourlard

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Morency, LP., Darrell, T. (2008). Conditional Sequence Model for Context-Based Recognition of Gaze Aversion. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2007. Lecture Notes in Computer Science, vol 4892. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78155-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-78155-4_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78154-7
Online ISBN: 978-3-540-78155-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics