Abstract
This Chapter presents a vision-based system for touch-free interaction with a display at a distance. A single camera is fixed on top of the screen and is pointing towards the user. An attention mechanism allows the user to start the interaction and control a screen pointer by moving their hand in a fist pose directed at the camera. On-screen items can be chosen by a selection mechanism. Current sample applications include browsing video collections as well as viewing a gallery of 3D objects, which the user can rotate with their hand motion. We have included an up-to-date review of hand tracking methods, and comment on the merits and shortcomings of previous approaches. The proposed tracker uses multiple cues, appearance, color, and motion, for robustness. As the space of possible observation models is generally too large for exhaustive online search, we select models that are suitable for the particular tracking task at hand. During a training stage, various off-the-shelf trackers are evaluated. From this data differentmethods of fusing them online are investigated, including parallel and cascaded tracker evaluation. For the case of fist tracking, combining a small number of observers in a cascade results in an efficient algorithm that is used in our gesture interface. The system has been on public display at conferences where over a hundred users have engaged with it.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Argyros, A.A., Lourakis, M.I.A.: Real-time tracking of multiple skin-colored objects with a possibly moving camera. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3023, pp. 368–379. Springer, Heidelberg (2004)
Argyros, A.A., Lourakis, M.I.A.: Vision-based interpretation of hand gestures for remote control of a computer mouse. In: Huang, T.S., Sebe, N., Lew, M., Pavlović, V., Kölsch, M., Galata, A., Kisačanin, B. (eds.) ECCV 2006 Workshop on HCI. LNCS, vol. 3979, pp. 40–51. Springer, Heidelberg (2006)
Athitsos, V., Alon, J., Sclaroff, S., Kollios, G.: Boostmap: A method for efficient approximate similarity rankings. Boston University Computer Science Technical Report No. 2003-023 (2003)
Avidan, S.: Support vector tracking. IEEE Transaction Pattern on Analysis and Machine Intelligence 26(8), 1064–1072 (2004)
Avidan, S.: Ensemble tracking. IEEE Transaction Pattern on Analysis and Machine Intelligence 29(2), 261–271 (2007)
Badrinarayanan, V., Pérez, P., Le Clerc, F., Oisel, L.: Probabilistic color and adaptive multi-feature tracking with dynamically switched priority between cues. In: Proceedings of the International Conference on Computer Vision (2007)
Billinghurst, M., Kato, H., Poupyrev, I.: The MagicBook - moving seamlessly between reality and virtuality. IEEE Computer Graphics & Applications 21(3), 6–8 (2001)
Birchfield, S.: KLT: An implementation of the Kanade-Lucas-Tomasi feature tracker, http://www.ces.clemson.edu/~stb/klt/
Birchfield, S.: Elliptical head tracking using intensity gradients and color histograms. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 232–237 (1998)
Black, M.J., Jepson, A.: Eigentracking: Robust matching and tracking of articulated objects using a view-based representation. In: Buxton, B.F., Cipolla, R. (eds.) ECCV 1996. LNCS, vol. 1065, pp. 329–342. Springer, Heidelberg (1996)
Bretzner, L., Laptev, I., Lindeberg, T.: Hand gesture recognition using multi-scale colour features, hierarchical models and particle filtering. In: Proceedings of the International Conference on Face and Gesture, pp. 423–428 (2002)
Buehler, P., Everingham, M., Huttenlocher, D.P., Zisserman, A.: Long term arm and hand tracking for continuous sign language tv broadcasts. In: Proceedings of the British Machine Vision Conference (2008)
de Campos, T.E., Murray, D.W.: Regression-based hand pose estimation from multiple cameras. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2006)
Canesta, http://canesta.com (Accessed on October 19, 2009)
Cipolla, R., Hadfield, P.A., Hollinghurst, N.J.: Uncalibrated stereo vision with pointing for a man-machine interface. In: Proceedings of the IAPR Workshop on Machine Vision Applications, pp. 163–166 (1994)
Cipolla, R., Hollinghurst, N.J.: Human-robot interface by pointing with uncalibrated stereo vision. Image and Vision Computing 14(3), 171–178 (1996)
Collins, R.T., Liu, Y., Leordeanu, M.: Online selection of discriminative tracking features. Transaction on Pattern Analysis and Machine Intelligence 27(10), 1631–1643 (2005)
Collins, R.T., Zhou, X., Teh, S.K.: An open source tracking testbed and evaluation web site. In: Proceedings of the International Workshop on Performance Evaluation of Tracking and Surveillance (2005)
Comaniciu, D., Ramesh, V., Meer, P.: Kernel-based object tracking. Pattern Analysis and Machine Intelligence 25(5), 564–575 (2003)
Cooper, H.M., Bowden, R.: Large lexicon detection of sign language. In: Lew, M., Sebe, N., Huang, T.S., Bakker, E.M. (eds.) HCI 2007. LNCS, vol. 4796, pp. 88–97. Springer, Heidelberg (2007)
Delamarre, Q., Faugeras, O.D.: Finding pose of hand in video images: a stereo-based approach. In: Proceedings of the International Conference on Automatic Face and Gesture Recogntion, pp. 585–590 (1998)
Doucet, A., de Freitas, N.G., Gordon, N.J. (eds.): Sequential Monte Carlo Methods in Practice. Springer, Heidelberg (2001)
Du, W., Piater, J.: A probabilistic approach to integrating multiple cues in visual tracking. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 225–238. Springer, Heidelberg (2008)
Erol, A., Bebis, G., Nicolescu, M., Boyle, R.D., Twombly, X.: Vision-based hand pose estimation: A review. Computer Vision and Image Understanding - Special Issue on Vision for Human-Computer Interaction 108, 52–73 (2007)
EyeToy, http://www.eyetoy.com (Accessed on October 19, 2009)
Freeman, W.T., Weissman, C.D.: Television control by hand gestures. In: Proceedings of the International Workshop on Automatic Face and Gesture Recognition (1995)
GestureTek, http://www.gesturetek.com/ (Accessed on October 19, 2009)
de la Gorce, M., Paragios, N., Fleet, D.: Model-based hand tracking with texture, shading and self-occlusions. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2008)
Grabner, H., Bischof, H.: On-line boosting and vision. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 260–267 (2006)
Grabner, H., Leistner, C., Bischof, H.: Semi-supervised on-line boosting for robust tracking. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 234–247. Springer, Heidelberg (2008)
Graf, H.P., Cosatto, E., Gibbon, D., Kocheisen, M.: Multi-modal system for locating heads and faces. In: Proceedings of the International Conference on Automatic Face and Gesture Recognition, pp. 88–93 (1996)
Guan, H., Chang, J., Chen, L., Feris, R., Turk, M.: Multi-view appearance-based 3d hand pose estimation. In: Proceedings of the International Workshop on Vision for Human Computer Interaction (2006)
Hager, G.D., Belhumeur, P.N.: Real-time tracking of image regions with changes in geometry and illumination. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 403–410 (1996)
Hamer, H., Schindler, K., Koller-Meier, E., van Gool, L.: Tracking a hand manipulating an object. In: Proceedings of the International Conference on Computer Vision (2009)
Heap, A.J., Hogg, D.C.: Towards 3-D hand tracking using a deformable model. In: Proceedings of the International Conference on Face and Gesture Recognition, pp. 140–145 (1996)
Huttenlocher, D.P., Noh, J.J., Rucklidge, W.J.: Tracking non-rigid objects in complex scenes. In: Proceedings of the International Conference on Computer Vision, pp. 93–101 (1993)
Ike, T., Kishikawa, N., Stenger, B.: A real-time hand gesture interface implemented on a multi-core processor. In: Proceedings of the International Conference on Machine Vision Applications, pp. 9–12 (2007)
Ike, T., Kishikawa, N., Stenger, B.: A real-time hand gesture interface implemented on a multi-core processor. In: Proceedings of the International Conference on Machine Vision Applications, pp. 9–12 (2007)
Isard, M., Blake, A.: Condensation — conditional density propagation for visual tracking. International Journal of Computer Vision 29(1), 5–28 (1998)
Isard, M., Blake, A.: ICondensation: Unifying low-level and high-level tracking in a stochastic framework. In: Burkhardt, H.-J., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1406, pp. 893–908. Springer, Heidelberg (1998)
Isard, M., Blake, A.: A mixed-state condensation tracker with automatic model-switching. In: Proceedings of the International Conference on Computer Vision, pp. 107–112 (1998)
Izadi, S., Agarwal, A., Criminisi, A., Winn, J., Blake, A., Fitzgibbon, A.: C-slate: Exploring remote collaboration on horizontal multi-touch surfaces. In: Proceedings of IEEE Tabletop (2007)
Jones, M.J., Rehg, J.M.: Statistical color models with application to skin detection. International Journal of Computer Vision 46(1), 81–96 (2002)
Kaucic, R., Perera, A.G.A., Brooksby, G., Kaufhold, J., Hoogs, A.: A unified framework for tracking through occlusions and sensor gaps. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 990–997 (2005)
Kölsch, M., Turk, M.: Fast 2D hand tracking with flocks of features and multi-cue integration. In: Proceedings of the International Workshop on Real-Time Vision for HCI (2004)
Kölsch, M., Turk, M.: Robust hand detection. In: Proceedings of the International Conference on Automatic Face and Gesture Recognition, pp. 614–619 (2004)
Krahnstoever, N., Schapira, E., Kettebekov, S., Sharma, R.: Multimodal human-computer interaction for crisis management systems. In: Proceedings of the International Workshop on Applications of Computer Vision, pp. 203–207 (2002)
Leichter, I., Lindenbaum, M., Rivlin, E.: A generalized framework for combining visual trackers – the black boxes approach. International Journal of Computer Vision 67(2), 91–110 (2006)
Li, Y., Ai, H., Yamashita, T., Lao, S., Kawade, M.: Tracking in low frame rate video: A cascade particle filter with discriminative observers of different lifespans. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2007)
Lockton, R., Fitzgibbon, A.W.: Real-time gesture recognition using deterministic boosting. In: Proceedings of the British Machine Vision Conference, vol. II, pp. 817–826 (2002)
Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 674–679 (1981)
MacCormick, J., Isard, M.: Partitioned sampling, articulated objects, and interface-quality hand tracking. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1843, pp. 3–19. Springer, Heidelberg (2000)
Microsoft Surface, http://www.microsoft.com/surface/ (Accessed on October 19, 2009)
Mita, T., Kaneko, T., Stenger, B., Hori, O.: Discriminative feature co-occurrence selection for object detection. Transaction on Pattern Analysis and Machine Intelligence 30(7), 1257–1269 (2008)
Moreno-Noguer, F., Sanfeliu, A., Samaras, D.: Dependent multiple cue integration for robust tracking. Transaction on Pattern Analysis and Machine Intelligence 30(4), 670–685 (2008)
Nintendo Wii, http://www.nintendo.com/wii (Accessed on October 19 , 2009)
Oblong Industries, http://oblong.com/ (Accessed on October 19, 2009)
Oka, K., Sato, Y., Koike, H.: Real-time fingertip tracking and gesture recognition. Computer Graphics and Applications 22(6), 64–71 (2002)
Okuma, K., Taleghani, A., de Freitas, N., Little, J.J., Lowe, D.G.: A boosted particle filter: Multitarget detection and tracking. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 28–39. Springer, Heidelberg (2004)
Ong, E.J., Bowden, R.: A boosted classifier tree for hand shape detection. In: Proceedings of the International Conference on Automatic Face and Gesture Recognition, pp. 889–894 (2004)
Ong, S.C.W., Ranganath, S.: Automatic sign language analysis: A survey and the future beyond lexical meaning. Transaction on Pattern Analysis and Machine Intelligence 27(6), 873–891 (2005)
Pavlović, V., Sharma, R., Huang, T.: Visual interpretation of hand gestures for human-computer interaction: A review. Transaction on Pattern Analysis and Machine Intelligence 19(7), 677–695 (1997)
Pérez, P., Vermaak, J., Blake, A.: Data fusion for visual tracking with particles. Proceedings of the IEEE 92(3), 495–513 (2004)
Playstation Eye, http://www.us.playstation.com/ps3/accessories/scph-98047 (Accessed on October 19, 2009)
Project Natal, http://www.xbox.com/en-us/live/projectnatal/ (Accessed on October 19, 2009)
Rehg, J.M.: Visual analysis of high dof articulated objects with application to hand tracking. Ph.D. thesis, Carnegie Mellon University, Dept. of Electrical and Computer Engineering (1995)
Robertson, P., Laddaga, R., Van Kleek, M.: Virtual mouse vision based interface. In: Proceedings of the International Conference on Intelligent User Interfaces, pp. 177–183 (2004)
Shimada, N., Kimura, K., Shirai, Y.: Real-time 3-D hand posture estimation based on 2-D appearance retrieval using monocular camera. In: Proceedings of the International Workshop RATFG-RTS, pp. 23–30 (2001)
Starner, T., Weaver, J., Pentland, A.: Real-time American Sign Language recognition using desk and wearable computer-based video. IEEE Transaction on Pattern Analysis and Machine Intelligence 20(12), 1371–1375 (1998)
Stefanov, N., Galata, A., Hubbold, R.: Real-time hand tracker using variable-length markov models of behaviour. Computer Vision and Image Understanding 108(1-2), 98–115 (2007)
Stenger, B.: Template-based hand pose recognition using multiple cues. In: Narayanan, P.J., Nayar, S.K., Shum, H.-Y. (eds.) ACCV 2006. LNCS, vol. 3852, pp. 551–560. Springer, Heidelberg (2006)
Stenger, B., Thayananthan, A., Torr, P.H.S., Cipolla, R.: Model-based hand tracking using a hierarchical bayesian filter. Transaction on Pattern Analysis and Machine Intelligence 28(9), 1372–1384 (2006)
Stenger, B., Woodley, T., Cipolla, R.: Learning to track with multiple observers. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2009)
Stenger, B., Woodley, T., Kim, T.K., Hernández, C., Cipolla, R.: AIDIA: adaptive interface for display interaction. In: Proceedings of the British Machine Vision Conference (2008)
Tomasi, C., Kanade, T.: Detection and tracking of point features. Technical Report CMU-CS-91-132, Carnegie Mellon University (1991)
Tosas, M.: Visual articulated hand tracking for interactive surfaces. Ph.D. thesis, University of Nottingham (2006)
Toshiba Qosmio Press Release, http://laptops.toshiba.com/pressrelease/423413 (Accessed on October 19, 2009)
Triesch, J., von der Malsburg, C.: A system for person-independent hand posture recognition against complex backgrounds. IEEE Transaction on Pattern Analysis and Machine Intelligence 23(12), 1449–1453 (2001)
Ueda, N., Mase, K.: Tracking moving contours using energy-minimizing elastic contour models. In: Sandini, G. (ed.) ECCV 1992. LNCS, vol. 588, pp. 453–457. Springer, Heidelberg (1992)
Varma, M., Ray, D.: Learning the discriminative power-invariance trade-off. In: Proceedings of International Conference on Computer Vision (2007)
Viola, P., Jones, M.J.: Robust real-time face detection. International Journal of Computer Vision 57(2), 137–154 (2004)
Wang, R.Y., Popović, J.: Real-time hand-tracking with a color glove. ACM Transactions on Graphics 28(3) (2009)
Wellner, P.: Interacting with paper on the digitaldesk. Communications of the ACM 36(7), 87–96 (1993)
Williams, O., Blake, A., Cipolla, R.: Sparse Bayesian learning for efficient visual tracking. Transaction on Pattern Analysis and Machine Intelligence 27, 1292–1304 (2005)
Woodfill, J., Zabih, R.D.: An algorithm for real-time tracking of non-rigid objects. In: Proceedings of the American Association for Artificial Intelligence (1991)
Wu, Y., Huang, T.S.: Vision-based gesture recognition: A review. In: Braffort, A., Gibet, S., Teil, D., Gherbi, R., Richardson, J. (eds.) GW 1999. LNCS (LNAI), vol. 1739, pp. 103–116. Springer, Heidelberg (2000)
Wu, Y., Huang, T.S.: View-independent recognition of hand postures. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 88–94 (2000)
Wu, Y., Huang, T.S.: Human hand modeling, analysis and animation in the context of human computer interaction. IEEE Signal Processing Magazine, Special issue on Immersive Interactive Technology 18(3), 51–60 (2001)
Wu, Y., Lin, J.Y., Huang, T.S.: Capturing natural hand articulation. In: Proceedings of the International Conference on Computer Vision, pp. 426–432 (2001)
Zhou, H., Huang, T.S.: Tracking articulated hand motion with eigen-dynamics analysis. In: Proceedings of the International Conference on Computer Vision, pp. 1102–1109 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Stenger, B., Woodley, T., Cipolla, R. (2010). A Vision-Based Remote Control. In: Cipolla, R., Battiato, S., Farinella, G.M. (eds) Computer Vision. Studies in Computational Intelligence, vol 285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12848-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-12848-6_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12847-9
Online ISBN: 978-3-642-12848-6
eBook Packages: EngineeringEngineering (R0)