Skip to main content
Log in

A visual attention-based method to address the midas touch problem existing in gesture-based interaction

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

The “Midas Touch” problem has long been a difficult problem existing in gesture-based interaction. This paper proposes a visual attention-based method to address this problem from the perspective of cognitive psychology. There are three main contributions in this paper: (1) a visual attention-based parallel perception model is constructed by combining top-down and bottom-up attention, (2) a framework is proposed for dynamic gesture spotting and recognition simultaneously, and (3) a gesture toolkit is created to facilitate gesture design and development. Experimental results show that the proposed method has a good performance for both isolated and continuous gesture recognition tasks. Finally, we highlight the implications of this work for the design and development of all gesture-based applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Betke, M., Gips, J., Fleming, P.: The Camera Mouse: visual tracking of body features to provide computer access for people with severe disabilities. IEEE Trans. Neural Syst. Rehabil. Eng. 10(1), 1–10 (2002)

    Article  Google Scholar 

  2. Colaco, A., Kirmani, A., Yang, H.S., Gong, N.W., Schmandt, C., Goyal, V.K. Mime: Compact, low-power 3D gesture sensing for interaction with head-mounted displays. In: Proceedings of the ACM Symposium of User Interface Software and Technology (UIST’13), pp. 227–236 (2013)

  3. Cover, T.M., Thomas, J.A.: Entropy, Relative Entropy and Mutual Information. Elements of Information Theory. Wiley, New York (1991)

    Google Scholar 

  4. Elmezain, M., Hamadi, A.A., Michaelis, B.: Improving hand gesture recognition using 3D combined features. In Second International Conference on Machine Vision, pp. 128–132 (2009)

  5. Feng, Z.Q., Zhang, M.M., Pan, Z.G., Yang, B., Xu, T., Tang, H.K., Li, Y.: 3D-Freehand-pose initialization based on operator’s cognitive behavioral models. Vis. Comput. 26(6–8), 607–617 (2010)

    Article  Google Scholar 

  6. Hilliges, O., Izadi, S., Wilson, A.D., Hodges, S., Mendoza, A.G., Butz, A.: Interactions in the air: adding further depth to interactive tabletops. In: Proceedings of the ACM Symposium of User Interface Software and Technology (UIST’ 09). pp. 139–148 (2009)

  7. Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998)

    Article  Google Scholar 

  8. Itti, L.: Models of Bottom-Up and Top-Down Visual Attention. California Institute of Technology, Pasadena (2000)

    Google Scholar 

  9. Jacob, R.J.K.: Eye Movement-Based Human-Computer Interaction Techniques: Toward Non-Command Interfaces. Advances in Human-Computer Interaction, vol. 4, pp. 151–190. Ablex Publishing Co., Norwood (1993)

  10. Jonides, J.: Further towards a model of the mind’s eye’s movement. Bull. Psychon. Soc. 21(4), 247–250 (1983)

    Article  Google Scholar 

  11. Kato, H., Billinghurst, M., Poupyrev, I.: Virtual object manipulation on a table-top AR environment. In: Proceedings of the ISAR2000, pp. 111–119 (2000)

  12. Kjeldsen, R., Levas, A., Pinhanez, C.: Dynamically reconfigurable vision-based user interfaces. Mach. Vis. Appl. 16(1), 6–12 (2004)

    Article  Google Scholar 

  13. Koch, C., Ullman, S.: Shifts in selective visual attention: towards the underlying neural circuitry. Hum. Neurobiol 4, 219–227 (1985)

    Google Scholar 

  14. Kölsch, M., Turk, M., Höllerer, T.: Vision-based interfaces for mobility. In: Proceedings of IEEE International Conference on Mobile and Ubiquitous Systems (Mobiquitous’04), pp. 86–94 (2004)

  15. Kristensson, P.O., Nicholson, T.F.W., Quigley, A.: Continuous recognition of one-handed and two-handed gestures using 3d full-body motion tracing sensors. In: Proceedings of the 17th International Conference on Intelligent User Interfaces (IUI’12), pp. 89–92 (2012)

  16. Lee, H., Kim, J.: An HMM-based threshold model approach for gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 21(10), 961–973 (1999)

    Article  Google Scholar 

  17. Liang, H., Yuan, J.S., Thalmann, D., Zhang, Z.Y.: Model-based hand pose estimation via spatial-temporal hand parsing and 3D fingertip localization. Vis. Comput. 29(6–8), 837–848 (2013)

    Article  Google Scholar 

  18. Marr, D.: Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. W.H.Freeman, San Francisco (1982)

    Google Scholar 

  19. Mo, Z.Y., Lewis, J.P., Neumann, U.: SmartCanvas: a gesture-driven intelligent drawing desk. In: Proceedings of the ACM Symposium of User Interface Software and Technology (UIST’05), pp. 239–243 (2005)

  20. Mujibiya, A., Miyaki, T., Rekimoto, J.: Anywhere touchtyping: text input on arbitrary surface using depth sensing. In: Proceedings of the ACM Symposium of User Interface Software and Technology (UIST’ 10), pp. 443–444 (2010)

  21. Nianjun, L., Brain, C.L., Peter, J.K., Richard, A.D.: Model structure selection \({\$}\) training algorithms for a HMM gesture recognition system. In: International IWFHR, pp. 100–106 (2004)

  22. Pan, Z.G., Li, Y., Zhang, M.M., Sun, C., Guo, K.D., Tang, X., Zhou, S.Z.Y.: A real-time multi-cue hand tracking algorithm based on computer vision. In: Proceedings of the 2010 IEEE Virtual Reality Conference, pp. 219–222 (2010)

  23. Pedersoli, F., Benini, S., Adami, N., Leonardi, R.: XKin: an open source framework for hand pose and gesture recognition using Kinect. Vis. Comput. 30(10), 1107–1122 (2014)

    Article  Google Scholar 

  24. Peng, B., Qian, G.: Online gesture spotting from visual hull data. IEEE Trans. Pattern Anal. Mach. Intell. 33(6), 1175–1188 (2011)

    Article  Google Scholar 

  25. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE. 77(2), 257–286 (1989)

    Article  Google Scholar 

  26. Rovelo, G., Vanacken, D., Luyten, K., Abad, F., Camahort, E.: Multi-viewer gesture-based interaction for omni-directional video. In: Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI’14), pp. 4077–4086 (2014)

  27. Salah, A.A., Alpaydin, E., Akarun, L.: A selective attention-based method for visual pattern recognition with application to handwritten digit recognition and face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 420–425 (2002)

    Article  Google Scholar 

  28. Shen, Y., Ong, S.K., Nee, A.Y.C.: Vision-based hand interaction in augmented reality environment. Int. J. Hum. Comput. Interact. 27(6), 523–544 (2011)

    Article  Google Scholar 

  29. Song, P., Goh, W.B., Hutama, W., Fu, C.W., Liu, X.P.: A handle bar metaphor for virtual object manipulation with mid-air interaction. In: Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI’12), pp. 1297–1306 (2012)

  30. Tian, M.: Top-down attention motivated research on perception model. Ph.D. thesis, Beijing Jiaotong University, China (2007)

  31. Treisman, A.M., Gelade, G.: A feature-integration theory of attention. Cognit. Psychol. 12(1), 97–136 (1980)

    Article  Google Scholar 

  32. Ungerleider, L.G., Mishkin, M.: Two cortical visual systems. In: Ingle, D.J., Goodale, M.A., Mansfield, R.W. (eds.) Analysis of Visual Behavior, pp. 549–586. The MIT Press, Cambridge (1982)

    Google Scholar 

  33. Vatavu, R.D.: User-defined gestures for free-hand TV control. In: Proceedings of the 10th European Conference on Interactive TV and Video (EuroITV’12), pp. 45–48 (2012)

  34. Walter, R., Bailly, G., Muller, J.: StrikeAPose: revealing mid-air gestures on public displays. In: Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI’13), pp. 841–850 (2013)

  35. Wilson, A.D.: Robust computer vision-based detection of pinching for one and two-handed gesture input. In: Proceedings of the ACM Symposium of User Interface Software and Technology (UIST’ 06), pp. 255–258 (2006)

  36. Yang, H.D., Park, A.Y., Lee, S.W.: Gesture spotting and recognition for human-robot interaction. IEEE Trans. Robot. 23(2), 256–270 (2007)

    Article  Google Scholar 

  37. Yang, M.H., Ahuja, N., Tabb, M.: Extraction of 2D motion trajectories and its application to hand gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 24(8), 1061–1074 (2002)

    Article  Google Scholar 

  38. Wu, H.Y., Zhang, F.J., Liu, Y.J., Hu, Y.H., Dai, G.Z.: Vision-based gesture interfaces toolkit for interactive games. J. Softw. 22(5), 1067–1081 (2011)

    Google Scholar 

Download references

Acknowledgments

We thank the financial support from the National Natural Science Foundation of China, No. 61202344; the Fundamental Research Funds for the Central Universities, Sun Yat-Sen University, No. 1209119; Special Project on the Integration of Industry, Education and Research of Guangdong Province, No. 2012B091000062; the Fundamental Research Funds for the Central Universities, Tongji University, No. 0600219052, 0600219053. We would like to express our great appreciation to editor and reviewers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huiyue Wu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, H., Wang, J. A visual attention-based method to address the midas touch problem existing in gesture-based interaction. Vis Comput 32, 123–136 (2016). https://doi.org/10.1007/s00371-014-1060-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-014-1060-0

Keywords

Navigation