Skip to main content

A Vision-Based Remote Control

  • Chapter
Computer Vision

Part of the book series: Studies in Computational Intelligence ((SCI,volume 285))

Abstract

This Chapter presents a vision-based system for touch-free interaction with a display at a distance. A single camera is fixed on top of the screen and is pointing towards the user. An attention mechanism allows the user to start the interaction and control a screen pointer by moving their hand in a fist pose directed at the camera. On-screen items can be chosen by a selection mechanism. Current sample applications include browsing video collections as well as viewing a gallery of 3D objects, which the user can rotate with their hand motion. We have included an up-to-date review of hand tracking methods, and comment on the merits and shortcomings of previous approaches. The proposed tracker uses multiple cues, appearance, color, and motion, for robustness. As the space of possible observation models is generally too large for exhaustive online search, we select models that are suitable for the particular tracking task at hand. During a training stage, various off-the-shelf trackers are evaluated. From this data differentmethods of fusing them online are investigated, including parallel and cascaded tracker evaluation. For the case of fist tracking, combining a small number of observers in a cascade results in an efficient algorithm that is used in our gesture interface. The system has been on public display at conferences where over a hundred users have engaged with it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Argyros, A.A., Lourakis, M.I.A.: Real-time tracking of multiple skin-colored objects with a possibly moving camera. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3023, pp. 368–379. Springer, Heidelberg (2004)

    Google Scholar 

  2. Argyros, A.A., Lourakis, M.I.A.: Vision-based interpretation of hand gestures for remote control of a computer mouse. In: Huang, T.S., Sebe, N., Lew, M., Pavlović, V., Kölsch, M., Galata, A., Kisačanin, B. (eds.) ECCV 2006 Workshop on HCI. LNCS, vol. 3979, pp. 40–51. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Athitsos, V., Alon, J., Sclaroff, S., Kollios, G.: Boostmap: A method for efficient approximate similarity rankings. Boston University Computer Science Technical Report No. 2003-023 (2003)

    Google Scholar 

  4. Avidan, S.: Support vector tracking. IEEE Transaction Pattern on Analysis and Machine Intelligence 26(8), 1064–1072 (2004)

    Article  Google Scholar 

  5. Avidan, S.: Ensemble tracking. IEEE Transaction Pattern on Analysis and Machine Intelligence 29(2), 261–271 (2007)

    Article  Google Scholar 

  6. Badrinarayanan, V., Pérez, P., Le Clerc, F., Oisel, L.: Probabilistic color and adaptive multi-feature tracking with dynamically switched priority between cues. In: Proceedings of the International Conference on Computer Vision (2007)

    Google Scholar 

  7. Billinghurst, M., Kato, H., Poupyrev, I.: The MagicBook - moving seamlessly between reality and virtuality. IEEE Computer Graphics & Applications 21(3), 6–8 (2001)

    Google Scholar 

  8. Birchfield, S.: KLT: An implementation of the Kanade-Lucas-Tomasi feature tracker, http://www.ces.clemson.edu/~stb/klt/

  9. Birchfield, S.: Elliptical head tracking using intensity gradients and color histograms. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 232–237 (1998)

    Google Scholar 

  10. Black, M.J., Jepson, A.: Eigentracking: Robust matching and tracking of articulated objects using a view-based representation. In: Buxton, B.F., Cipolla, R. (eds.) ECCV 1996. LNCS, vol. 1065, pp. 329–342. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  11. Bretzner, L., Laptev, I., Lindeberg, T.: Hand gesture recognition using multi-scale colour features, hierarchical models and particle filtering. In: Proceedings of the International Conference on Face and Gesture, pp. 423–428 (2002)

    Google Scholar 

  12. Buehler, P., Everingham, M., Huttenlocher, D.P., Zisserman, A.: Long term arm and hand tracking for continuous sign language tv broadcasts. In: Proceedings of the British Machine Vision Conference (2008)

    Google Scholar 

  13. de Campos, T.E., Murray, D.W.: Regression-based hand pose estimation from multiple cameras. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2006)

    Google Scholar 

  14. Canesta, http://canesta.com (Accessed on October 19, 2009)

  15. Cipolla, R., Hadfield, P.A., Hollinghurst, N.J.: Uncalibrated stereo vision with pointing for a man-machine interface. In: Proceedings of the IAPR Workshop on Machine Vision Applications, pp. 163–166 (1994)

    Google Scholar 

  16. Cipolla, R., Hollinghurst, N.J.: Human-robot interface by pointing with uncalibrated stereo vision. Image and Vision Computing 14(3), 171–178 (1996)

    Article  Google Scholar 

  17. Collins, R.T., Liu, Y., Leordeanu, M.: Online selection of discriminative tracking features. Transaction on Pattern Analysis and Machine Intelligence 27(10), 1631–1643 (2005)

    Article  Google Scholar 

  18. Collins, R.T., Zhou, X., Teh, S.K.: An open source tracking testbed and evaluation web site. In: Proceedings of the International Workshop on Performance Evaluation of Tracking and Surveillance (2005)

    Google Scholar 

  19. Comaniciu, D., Ramesh, V., Meer, P.: Kernel-based object tracking. Pattern Analysis and Machine Intelligence 25(5), 564–575 (2003)

    Article  Google Scholar 

  20. Cooper, H.M., Bowden, R.: Large lexicon detection of sign language. In: Lew, M., Sebe, N., Huang, T.S., Bakker, E.M. (eds.) HCI 2007. LNCS, vol. 4796, pp. 88–97. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  21. Delamarre, Q., Faugeras, O.D.: Finding pose of hand in video images: a stereo-based approach. In: Proceedings of the International Conference on Automatic Face and Gesture Recogntion, pp. 585–590 (1998)

    Google Scholar 

  22. Doucet, A., de Freitas, N.G., Gordon, N.J. (eds.): Sequential Monte Carlo Methods in Practice. Springer, Heidelberg (2001)

    MATH  Google Scholar 

  23. Du, W., Piater, J.: A probabilistic approach to integrating multiple cues in visual tracking. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 225–238. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  24. Erol, A., Bebis, G., Nicolescu, M., Boyle, R.D., Twombly, X.: Vision-based hand pose estimation: A review. Computer Vision and Image Understanding - Special Issue on Vision for Human-Computer Interaction 108, 52–73 (2007)

    Google Scholar 

  25. EyeToy, http://www.eyetoy.com (Accessed on October 19, 2009)

  26. Freeman, W.T., Weissman, C.D.: Television control by hand gestures. In: Proceedings of the International Workshop on Automatic Face and Gesture Recognition (1995)

    Google Scholar 

  27. GestureTek, http://www.gesturetek.com/ (Accessed on October 19, 2009)

  28. de la Gorce, M., Paragios, N., Fleet, D.: Model-based hand tracking with texture, shading and self-occlusions. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2008)

    Google Scholar 

  29. Grabner, H., Bischof, H.: On-line boosting and vision. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 260–267 (2006)

    Google Scholar 

  30. Grabner, H., Leistner, C., Bischof, H.: Semi-supervised on-line boosting for robust tracking. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 234–247. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  31. Graf, H.P., Cosatto, E., Gibbon, D., Kocheisen, M.: Multi-modal system for locating heads and faces. In: Proceedings of the International Conference on Automatic Face and Gesture Recognition, pp. 88–93 (1996)

    Google Scholar 

  32. Guan, H., Chang, J., Chen, L., Feris, R., Turk, M.: Multi-view appearance-based 3d hand pose estimation. In: Proceedings of the International Workshop on Vision for Human Computer Interaction (2006)

    Google Scholar 

  33. Hager, G.D., Belhumeur, P.N.: Real-time tracking of image regions with changes in geometry and illumination. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 403–410 (1996)

    Google Scholar 

  34. Hamer, H., Schindler, K., Koller-Meier, E., van Gool, L.: Tracking a hand manipulating an object. In: Proceedings of the International Conference on Computer Vision (2009)

    Google Scholar 

  35. Heap, A.J., Hogg, D.C.: Towards 3-D hand tracking using a deformable model. In: Proceedings of the International Conference on Face and Gesture Recognition, pp. 140–145 (1996)

    Google Scholar 

  36. Huttenlocher, D.P., Noh, J.J., Rucklidge, W.J.: Tracking non-rigid objects in complex scenes. In: Proceedings of the International Conference on Computer Vision, pp. 93–101 (1993)

    Google Scholar 

  37. Ike, T., Kishikawa, N., Stenger, B.: A real-time hand gesture interface implemented on a multi-core processor. In: Proceedings of the International Conference on Machine Vision Applications, pp. 9–12 (2007)

    Google Scholar 

  38. Ike, T., Kishikawa, N., Stenger, B.: A real-time hand gesture interface implemented on a multi-core processor. In: Proceedings of the International Conference on Machine Vision Applications, pp. 9–12 (2007)

    Google Scholar 

  39. Isard, M., Blake, A.: Condensation — conditional density propagation for visual tracking. International Journal of Computer Vision 29(1), 5–28 (1998)

    Article  Google Scholar 

  40. Isard, M., Blake, A.: ICondensation: Unifying low-level and high-level tracking in a stochastic framework. In: Burkhardt, H.-J., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1406, pp. 893–908. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  41. Isard, M., Blake, A.: A mixed-state condensation tracker with automatic model-switching. In: Proceedings of the International Conference on Computer Vision, pp. 107–112 (1998)

    Google Scholar 

  42. Izadi, S., Agarwal, A., Criminisi, A., Winn, J., Blake, A., Fitzgibbon, A.: C-slate: Exploring remote collaboration on horizontal multi-touch surfaces. In: Proceedings of IEEE Tabletop (2007)

    Google Scholar 

  43. Jones, M.J., Rehg, J.M.: Statistical color models with application to skin detection. International Journal of Computer Vision 46(1), 81–96 (2002)

    Article  MATH  Google Scholar 

  44. Kaucic, R., Perera, A.G.A., Brooksby, G., Kaufhold, J., Hoogs, A.: A unified framework for tracking through occlusions and sensor gaps. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 990–997 (2005)

    Google Scholar 

  45. Kölsch, M., Turk, M.: Fast 2D hand tracking with flocks of features and multi-cue integration. In: Proceedings of the International Workshop on Real-Time Vision for HCI (2004)

    Google Scholar 

  46. Kölsch, M., Turk, M.: Robust hand detection. In: Proceedings of the International Conference on Automatic Face and Gesture Recognition, pp. 614–619 (2004)

    Google Scholar 

  47. Krahnstoever, N., Schapira, E., Kettebekov, S., Sharma, R.: Multimodal human-computer interaction for crisis management systems. In: Proceedings of the International Workshop on Applications of Computer Vision, pp. 203–207 (2002)

    Google Scholar 

  48. Leichter, I., Lindenbaum, M., Rivlin, E.: A generalized framework for combining visual trackers – the black boxes approach. International Journal of Computer Vision 67(2), 91–110 (2006)

    Google Scholar 

  49. Li, Y., Ai, H., Yamashita, T., Lao, S., Kawade, M.: Tracking in low frame rate video: A cascade particle filter with discriminative observers of different lifespans. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2007)

    Google Scholar 

  50. Lockton, R., Fitzgibbon, A.W.: Real-time gesture recognition using deterministic boosting. In: Proceedings of the British Machine Vision Conference, vol. II, pp. 817–826 (2002)

    Google Scholar 

  51. Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 674–679 (1981)

    Google Scholar 

  52. MacCormick, J., Isard, M.: Partitioned sampling, articulated objects, and interface-quality hand tracking. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1843, pp. 3–19. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  53. Microsoft Surface, http://www.microsoft.com/surface/ (Accessed on October 19, 2009)

  54. Mita, T., Kaneko, T., Stenger, B., Hori, O.: Discriminative feature co-occurrence selection for object detection. Transaction on Pattern Analysis and Machine Intelligence 30(7), 1257–1269 (2008)

    Article  Google Scholar 

  55. Moreno-Noguer, F., Sanfeliu, A., Samaras, D.: Dependent multiple cue integration for robust tracking. Transaction on Pattern Analysis and Machine Intelligence 30(4), 670–685 (2008)

    Article  Google Scholar 

  56. Nintendo Wii, http://www.nintendo.com/wii (Accessed on October 19 , 2009)

  57. Oblong Industries, http://oblong.com/ (Accessed on October 19, 2009)

  58. Oka, K., Sato, Y., Koike, H.: Real-time fingertip tracking and gesture recognition. Computer Graphics and Applications 22(6), 64–71 (2002)

    Article  Google Scholar 

  59. Okuma, K., Taleghani, A., de Freitas, N., Little, J.J., Lowe, D.G.: A boosted particle filter: Multitarget detection and tracking. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 28–39. Springer, Heidelberg (2004)

    Google Scholar 

  60. Ong, E.J., Bowden, R.: A boosted classifier tree for hand shape detection. In: Proceedings of the International Conference on Automatic Face and Gesture Recognition, pp. 889–894 (2004)

    Google Scholar 

  61. Ong, S.C.W., Ranganath, S.: Automatic sign language analysis: A survey and the future beyond lexical meaning. Transaction on Pattern Analysis and Machine Intelligence 27(6), 873–891 (2005)

    Article  Google Scholar 

  62. Pavlović, V., Sharma, R., Huang, T.: Visual interpretation of hand gestures for human-computer interaction: A review. Transaction on Pattern Analysis and Machine Intelligence 19(7), 677–695 (1997)

    Article  Google Scholar 

  63. Pérez, P., Vermaak, J., Blake, A.: Data fusion for visual tracking with particles. Proceedings of the IEEE 92(3), 495–513 (2004)

    Article  Google Scholar 

  64. Playstation Eye, http://www.us.playstation.com/ps3/accessories/scph-98047 (Accessed on October 19, 2009)

  65. Project Natal, http://www.xbox.com/en-us/live/projectnatal/ (Accessed on October 19, 2009)

  66. Rehg, J.M.: Visual analysis of high dof articulated objects with application to hand tracking. Ph.D. thesis, Carnegie Mellon University, Dept. of Electrical and Computer Engineering (1995)

    Google Scholar 

  67. Robertson, P., Laddaga, R., Van Kleek, M.: Virtual mouse vision based interface. In: Proceedings of the International Conference on Intelligent User Interfaces, pp. 177–183 (2004)

    Google Scholar 

  68. Shimada, N., Kimura, K., Shirai, Y.: Real-time 3-D hand posture estimation based on 2-D appearance retrieval using monocular camera. In: Proceedings of the International Workshop RATFG-RTS, pp. 23–30 (2001)

    Google Scholar 

  69. Starner, T., Weaver, J., Pentland, A.: Real-time American Sign Language recognition using desk and wearable computer-based video. IEEE Transaction on Pattern Analysis and Machine Intelligence 20(12), 1371–1375 (1998)

    Article  Google Scholar 

  70. Stefanov, N., Galata, A., Hubbold, R.: Real-time hand tracker using variable-length markov models of behaviour. Computer Vision and Image Understanding 108(1-2), 98–115 (2007)

    Article  Google Scholar 

  71. Stenger, B.: Template-based hand pose recognition using multiple cues. In: Narayanan, P.J., Nayar, S.K., Shum, H.-Y. (eds.) ACCV 2006. LNCS, vol. 3852, pp. 551–560. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  72. Stenger, B., Thayananthan, A., Torr, P.H.S., Cipolla, R.: Model-based hand tracking using a hierarchical bayesian filter. Transaction on Pattern Analysis and Machine Intelligence 28(9), 1372–1384 (2006)

    Article  Google Scholar 

  73. Stenger, B., Woodley, T., Cipolla, R.: Learning to track with multiple observers. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2009)

    Google Scholar 

  74. Stenger, B., Woodley, T., Kim, T.K., Hernández, C., Cipolla, R.: AIDIA: adaptive interface for display interaction. In: Proceedings of the British Machine Vision Conference (2008)

    Google Scholar 

  75. Tomasi, C., Kanade, T.: Detection and tracking of point features. Technical Report CMU-CS-91-132, Carnegie Mellon University (1991)

    Google Scholar 

  76. Tosas, M.: Visual articulated hand tracking for interactive surfaces. Ph.D. thesis, University of Nottingham (2006)

    Google Scholar 

  77. Toshiba Qosmio Press Release, http://laptops.toshiba.com/pressrelease/423413 (Accessed on October 19, 2009)

  78. Triesch, J., von der Malsburg, C.: A system for person-independent hand posture recognition against complex backgrounds. IEEE Transaction on Pattern Analysis and Machine Intelligence 23(12), 1449–1453 (2001)

    Article  Google Scholar 

  79. Ueda, N., Mase, K.: Tracking moving contours using energy-minimizing elastic contour models. In: Sandini, G. (ed.) ECCV 1992. LNCS, vol. 588, pp. 453–457. Springer, Heidelberg (1992)

    Google Scholar 

  80. Varma, M., Ray, D.: Learning the discriminative power-invariance trade-off. In: Proceedings of International Conference on Computer Vision (2007)

    Google Scholar 

  81. Viola, P., Jones, M.J.: Robust real-time face detection. International Journal of Computer Vision 57(2), 137–154 (2004)

    Article  Google Scholar 

  82. Wang, R.Y., Popović, J.: Real-time hand-tracking with a color glove. ACM Transactions on Graphics 28(3) (2009)

    Google Scholar 

  83. Wellner, P.: Interacting with paper on the digitaldesk. Communications of the ACM 36(7), 87–96 (1993)

    Article  Google Scholar 

  84. Williams, O., Blake, A., Cipolla, R.: Sparse Bayesian learning for efficient visual tracking. Transaction on Pattern Analysis and Machine Intelligence 27, 1292–1304 (2005)

    Article  Google Scholar 

  85. Woodfill, J., Zabih, R.D.: An algorithm for real-time tracking of non-rigid objects. In: Proceedings of the American Association for Artificial Intelligence (1991)

    Google Scholar 

  86. Wu, Y., Huang, T.S.: Vision-based gesture recognition: A review. In: Braffort, A., Gibet, S., Teil, D., Gherbi, R., Richardson, J. (eds.) GW 1999. LNCS (LNAI), vol. 1739, pp. 103–116. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  87. Wu, Y., Huang, T.S.: View-independent recognition of hand postures. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 88–94 (2000)

    Google Scholar 

  88. Wu, Y., Huang, T.S.: Human hand modeling, analysis and animation in the context of human computer interaction. IEEE Signal Processing Magazine, Special issue on Immersive Interactive Technology 18(3), 51–60 (2001)

    Google Scholar 

  89. Wu, Y., Lin, J.Y., Huang, T.S.: Capturing natural hand articulation. In: Proceedings of the International Conference on Computer Vision, pp. 426–432 (2001)

    Google Scholar 

  90. Zhou, H., Huang, T.S.: Tracking articulated hand motion with eigen-dynamics analysis. In: Proceedings of the International Conference on Computer Vision, pp. 1102–1109 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Stenger, B., Woodley, T., Cipolla, R. (2010). A Vision-Based Remote Control. In: Cipolla, R., Battiato, S., Farinella, G.M. (eds) Computer Vision. Studies in Computational Intelligence, vol 285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12848-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12848-6_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12847-9

  • Online ISBN: 978-3-642-12848-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics