Skip to main content

Conditional Sequence Model for Context-Based Recognition of Gaze Aversion

  • Conference paper
Machine Learning for Multimodal Interaction (MLMI 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4892))

Included in the following conference series:

Abstract

Eye gaze and gesture form key conversational grounding cues that are used extensively in face-to-face interaction among people. To accurately recognize visual feedback during interaction, people often use contextual knowledge from previous and current events to anticipate when feedback is most likely to occur. In this paper, we investigate how dialog context from an embodied conversational agent (ECA) can improve visual recognition of eye gestures. We propose a new framework for contextual recognition based on Latent-Dynamic Conditional Random Field (LDCRF) models to learn the sub-structure and external dynamics of contextual cues. Our experiments show that adding contextual information improves visual recognition of eye gestures and demonstrate that the LDCRF model for context-based recognition of gaze aversion gestures outperforms Support Vector Machines, Hidden Markov Models, and Conditional Random Fields.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Morency, L.P., Christoudias, C.M., Darrell, T.: Recognizing gaze aversion gestures in embodied conversational discourse. In: Proceedings of the International Conference on Multi-modal Interfaces, Banff, Canada (2006)

    Google Scholar 

  2. Morency, L.P., Quattoni, A., Darrell, T.: Latent-dynamic discriminative models for continuous gesture recognition. Technical Report MIT-CSAIL-TR-2007-002, MIT CSAIL (2007)

    Google Scholar 

  3. Kendon, A.: Some functions of gaze direction in social interaction. Acta Psyghologica 26, 22–63 (1967)

    Article  Google Scholar 

  4. Traum, D., Rickel, J.: Embodied agents for multi-party dialogue in immersive virtual worlds. In: Alonso, E., Kudenko, D., Kazakov, D. (eds.) Embodied agents for multi-party dialogue in immersive virtual worlds. LNCS (LNAI), vol. 2636, pp. 766–773. Springer, Heidelberg (2003)

    Google Scholar 

  5. Vertegaal, R., Slagter, R., van der Veer, G., Nijholt, A.: Eye gaze patterns in conversations: There is more to conversational agents than meets the eyes. In: CHI 2001. Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 301–308 (2001)

    Google Scholar 

  6. Fukayama, A., Ohno, T., Mukawa, N., Sawaki, M., Hagita, N.: Messages embedded in gaze of interface agents — impression management with agent’s gaze. In: CHI 2002. Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 41–48 (2002)

    Google Scholar 

  7. Velichkovsky, B.M., Hansen, J.P.: New technological windows in mind: There is more in eyes and brains for human-computer interaction. In: CHI 1996. Proceedings of the SIGCHI conference on Human factors in computing systems (1996)

    Google Scholar 

  8. Qvarfordt, P., Zhai, S.: Conversing with the user based on eye-gaze patterns. In: CHI 2005. Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 221–230 (2005)

    Google Scholar 

  9. Li, M., Selker, T.: Eye pattern analysis in intelligent virtual agents. In: (IVA 2002). Conference on Intelligent Virutal Agents, pp. 23–35 (2001)

    Google Scholar 

  10. Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based vision system for place and object recognition. In: IEEE Intl. Conference on Computer Vision (ICCV), Nice, France (2003)

    Google Scholar 

  11. Fujie, S., Ejiri, Y., Nakajima, K., Matsusaka, Y., Kobayashi, T.: A conversation robot using head gesture recognition as para-linguistic information. In: Proceedings of 13th IEEE International Workshop on Robot and Human Communication, RO-MAN 2004, pp. 159–164 (2004)

    Google Scholar 

  12. Morency, L.-P., Sidner, C., Lee, C., Darrell, T.: Contextual recognition of head gestures. In: Proceedings of the International Conference on Multi-modal Interfaces (2005)

    Google Scholar 

  13. Morency, L.-P., Darrell, T.: Head gesture recognition in intelligent interfaces: The role of context in improving recognition. In: Proceedings of Intelligent User Interfaces, Australia (2006)

    Google Scholar 

  14. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labelling sequence data. In: ICML (2001)

    Google Scholar 

  15. Quattoni, A., Collins, M., Darrell, T.: Conditional random fields for object recognition. In: NIPS (2004)

    Google Scholar 

  16. Gunawardana, A., Mahajan, M., Acero, A., Platt, J.C.: Hidden conditional random fields for phone classification. In: INTERSPEECH (2005)

    Google Scholar 

  17. Wang, S., Quattoni, A., Morency, L., Demirdjian, D., Darrell, T.: Hidden conditional random fields for gesture recognition. In: CVPR (2006)

    Google Scholar 

  18. Nakano, Reinstein, Stocky, Cassell, J.: Towards a model of face-to-face grounding. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan (2003)

    Google Scholar 

  19. Rich, N., Sidner, Lesh: Collagen: Applying collaborative discourse theory to human–computer interaction. AI Magazine, Special Issue on Intelligent User Interfaces 22(4), 15–25 (2001)

    Google Scholar 

  20. Kumar, S., Herbert., M.: Discriminative random fields: A framework for contextual interaction in classification. In: ICCV (2003)

    Google Scholar 

  21. Vapnik, V.: The nature of statistical learning theory. Springer, Heidelberg (1995)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Andrei Popescu-Belis Steve Renals Hervé Bourlard

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Morency, LP., Darrell, T. (2008). Conditional Sequence Model for Context-Based Recognition of Gaze Aversion. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2007. Lecture Notes in Computer Science, vol 4892. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78155-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78155-4_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78154-7

  • Online ISBN: 978-3-540-78155-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics