Abstract.
We present a vision system for human-machine interaction based on a small wearable camera mounted on glasses. The camera views the area in front of the user, especially the hands. To evaluate hand movements for pointing gestures and to recognise object references, an approach to integrating bottom-up generated feature maps and top-down propagated recognition results is introduced. Modules for context-free focus of attention work in parallel with the hand gesture recognition. In contrast to other approaches, the fusion of the two branches is on the sub-symbolic level. This method facilitates both the integration of different modalities and the generation of auditory feedback.
Similar content being viewed by others
References
Backer G, Mertsching B, Bollmann M (2001) Data- and model-driven gaze control for an active-vision system. IEEE Trans Pattern Anal Mach Intell 23(12):1415-1429
Bauckhage , Fink GA, Fritsch J, Kummert F, Lömker F, Sagerer G, Wachsmuth S (2001) An integrated system for cooperative man-machine interaction. In: IEEE international symposium on computer intelligence in robotics and automation, Banff, Canada
Bax I, Bekel H, Heidemann G (2003) Recognition of gestural object reference with auditory feedback. In: Proc. international conference on neural networks, Istanbul, Turkey, pp 425-432
Bruce V, Morgan M (1954) Violations of symmetry and repetition in visual patterns. Psychol Rev 61:183-193
Crevier D, Lepage R (1977) Knowledge-based image understanding systems: a survey. Comput Vision Image Understand 67(2):161-185
Fislage M, Rae R, Ritter H (1999) Using visual attention to recognize human pointing gestures in assembly tasks. In: 7th IEEE international conference on computer vision
Handmann U, Kalinke T, Tzomakas C, Werner M, van Seelen W (2000) An image processing system for driver assistance. Image Vision Comput 18(5):367-376
Harris C, Stephens M (1988) A combined corner and edge detector. In: Proc. 4th Alvey vision conference, pp 147-151
Heidemann G (2004) Combining spatial and colour information for content based image retrieval. Comput Vision Image Understand 94(1-3):234-270
Heidemann G (2004) Focus-of-attention from local color symmetries. IEEE Trans Pattern Anal Mach Intell 26(7):817-830
Heidemann G, Ritter H (2001) Efficient vector quantization using the WTA-rule with activity equalization. Neural Process Lett 13(1):17-30
Heidemann G, Ritter H (2003) Learning to recognise objects and situations to control a robot end-effector. KI Künstliche Intelligenz (Special Issue on Vision, Learning, Robotics) 2:24-29
Itti L, Koch C, Niebur E (1998) A Model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254-1259
Jähne B (1991) Digital image processing. Springer, Berlin Heidelberg New York
Jolliffe I (1986) Principal component analysis. Springer, Berlin Heidelberg New York
Kalinke T, Handmann U (1997) Fusion of texture and contour based methods for object recognition. In: IEEE conference on intelligent transportation systems, Stuttgart
Kalinke T, von Seelen W (1996) Entropie als Maß des lokalen Informationsgehalts in Bildern zur Realisierung einer Aufmerksamkeitssteuerung. In: Jähne B, Geißler P, Haußecker H, Hering F (eds) Mustererkennung, Springer, Berlin Heidelberg New York, pp 627-634
Kohonen T (1984) Self-organization and associative memory. In: Springer series in information sciences 8. Springer, Berlin Heidelberg New York
Locher PJ, Nodine CF (1987) Symmetry catches the eye. In: Levy-Schoen A, O’Reagan JK (eds) Eye movements: from physiology to cognition. Elsevier (North Holland), Amsterdam, pp 353-361
Moody J, Darken C (1988) Learning with localized receptive fields. In: Proc. 1988 Connectionist Models Summer School, Morgan Kaufman, San Mateo, CA, pp 133-143
Privitera CM, Stark LW (2000) Algorithms for defining visual regions-of-interest: comparison with eye fixations. IEEE Trans Pattern Anal Mach Intell 22(9):970-982
Reisfeld D, Wolfson H, Yeshurun Y (1995) Context-free attentional operators: the generalized symmetry transform. Int J Comput Vision 14:119-130
Riesenhuber M, Poggio T (1999) Hierarchical models of object recognition in cortex. Nat Neurosci 2(11):1019-1025
Ritter HJ, Martinetz TM, Schulten KJ (1992) Neuronale Netze. Addison-Wesley, Munich
Sanger TD (1989) Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Netw 2:459-473
Schmid C, Mohr R, Bauckhage C (2000) Evaluation of interest point detectors. Int J Comput Vision 37(2):151-172
Shannon CE (1948) A mathematical theory of communication. Bell Systems Tech J 27:379-423
Theis C, Iossifidis I, Steinhage A (2001) Image processing methods for interactive robot control. In: Proc. IEEE Roman international workshop on robot-human interactive communication, Bordeaux and Paris, France
Tipping ME, Bishop CM (1999) Mixtures of probabilistic principal component analyzers. Neural Comput 11(2):443-482
Walther D, Itti L, Riesenhuber M, Poggio T, Koch C (2002) Attentional selection for object recognition - a gentle way. In: Proc. 2nd workshop on biologically motivated computer vision (BMCV’02), Tübingen, Germany
Author information
Authors and Affiliations
Corresponding author
Additional information
Published online: 5 October 2004
Robert Rae: Now at PerFact Innovation, Lampingstr. 8, 33615 Bielefeld, Germany
Rights and permissions
About this article
Cite this article
Heidemann, G., Rae, R., Bekel, H. et al. Integrating context-free and context-dependent attentional mechanisms for gestural object reference. Machine Vision and Applications 16, 64–73 (2004). https://doi.org/10.1007/s00138-004-0157-2
Issue Date:
DOI: https://doi.org/10.1007/s00138-004-0157-2