skip to main content
10.1145/2207676.2208303acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

Instructing people for training gestural interactive systems

Published:05 May 2012Publication History

ABSTRACT

Entertainment and gaming systems such as the Wii and XBox Kinect have brought touchless, body-movement based interfaces to the masses. Systems like these enable the estimation of movements of various body parts from raw inertial motion or depth sensor data. However, the interface developer is still left with the challenging task of creating a system that recognizes these movements as embodying meaning. The machine learning approach for tackling this problem requires the collection of data sets that contain the relevant body movements and their associated semantic labels. These data sets directly impact the accuracy and performance of the gesture recognition system and should ideally contain all natural variations of the movements associated with a gesture. This paper addresses the problem of collecting such gesture datasets. In particular, we investigate the question of what is the most appropriate semiotic modality of instructions for conveying to human subjects the movements the system developer needs them to perform. The results of our qualitative and quantitative analysis indicate that the choice of modality has a significant impact on the performance of the learnt gesture recognition system; particularly in terms of correctness and coverage.

References

  1. Chalearn gesture dataset (cgd2011), chalearn, california, 2011.Google ScholarGoogle Scholar
  2. Aggarwal, J., and Ryoo, M. Human activity analysis: A review. ACM Computing Surveys (2011). To appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Breiman, L. Random forests. Machine Learning 45, 1 (2001). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bruner, J. Toward a theory of instruction. Belknap Press of Harvard University Press, 1966.Google ScholarGoogle Scholar
  5. Charbonneau, E., Miller, A., and LaViola, J. Teach me to dance: Exploring player experience and performance in full body dance games.Google ScholarGoogle Scholar
  6. Fothergill, S., Harle, R., and Holden, S. Modelling the model athlete : Automatic coaching of rowing technique. In Structural, Syntactic, and Statistical Pattern Recognition, vol. 5342 of LNCS (2008), 372--381. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Furui, S., Nakamura, M., Ichiba, T., and Iwano, K. Why is the recognition of spontaneous speech so hard? In Text, Speech and Dialogue, V. Matouek, P. Mautner, and T. Pavelka, Eds., vol. 3658 of Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2005, 747--747. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gorelick, L., Blank, M., Shechtman, E., Irani, M., and Basri, R. Actions as space-time shapes. Transactions on Pattern Analysis and Machine Intelligence 29, 12 (December 2007), 2247--2253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Guest, A. H. Labanotation, or, Kinetography Laban: The System of Analyzing and Recording Movements. Dance Books, 1996.Google ScholarGoogle Scholar
  10. Hwang, B.-W., K. S., and Lee, S.-W. A full-body gesture database for automatic gesture recognition. In Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition, FGR '06, IEEE Computer Society) (2006), 243--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kress, and van Leeuwen. Reading Images: Grammar of Visual Design. Routledge, 1996.Google ScholarGoogle Scholar
  12. Kuehne, H., J. H. G. E. P. T., and Serre, T. HMDB: a large video database for human motion recognition. In Proceedings of the International Conference on Computer Vision (ICCV) (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld, B. Learning realistic human actions from movies. In CVPR, IEEE Computer Society (2008).Google ScholarGoogle Scholar
  14. Lin, Z., Jiang, Z., and Davis, L. S. Recognizing actions by shape-motion prototype trees. In ICCV, IEEE (2009), 444--451.Google ScholarGoogle Scholar
  15. Liu, J. G., Luo, J. B., and Shah, M. Recognizing realistic actions from videos 'in the wild'. In CVPR (2009), 1996--2003.Google ScholarGoogle Scholar
  16. Marszałek, M., Laptev, I., and Schmid, C. Actions in context. In CVPR, IEEE (2009), 2929--2936.Google ScholarGoogle Scholar
  17. McNeil, D. Hand and Mind, What Gestures Reveal about Thought. The University of Chicago Press, 1992.Google ScholarGoogle Scholar
  18. Nowozin, S., and Shotton, J. Action points: A representation for low-latency online human action recognition.Google ScholarGoogle Scholar
  19. Nunnally, J. C., and Bernstein, I. H. Psychometric Theory. McGraw-Hill, 1994.Google ScholarGoogle Scholar
  20. Oh, S., Hoogs, A., Perera, A., Cuntoor, N., Chen, C.-C., Lee, J. T., Mukherjee, S., and et al. A large-scale benchmark dataset for event recognition in surveillance video. In CVPR (2011).Google ScholarGoogle Scholar
  21. Padmanabhan, M., Ramaswamy, G., Ramabhadran, B., Gopalakrishnan, P. S., and Dunn, C. Issues involved in voicemail data collection. In DARPA Hub 4 Workshop (1998).Google ScholarGoogle Scholar
  22. Peirce, C. On a new list of categories. Proceedings of the American Academy of Arts and Sciences (1867).Google ScholarGoogle Scholar
  23. Poppe, R. A survey on vision-based human action recognition. Image and Vision Computing 28, 6 (2010), 976--990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Quinn, D. Personal communication with David Quinn (RARE, UK), August 2011.Google ScholarGoogle Scholar
  25. Rijsbergen, C. J. V. Information Retrieval. Butterworths, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Rodriguez, M. D., Ahmed, J., and Shah, M. Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In CVPR, IEEE Computer Society (2008).Google ScholarGoogle Scholar
  27. Schindler, K., and Gool, L. J. V. Action snippets: How many frames does human action recognition require? In CVPR, IEEE Computer Society (2008).Google ScholarGoogle Scholar
  28. Schüldt, C., Laptev, I., and Caputo, B. Recognizing human actions: A local SVM approach. In ICPR (2004), 32--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. Real-time human pose recognition in parts from a single depth image. In CVPR (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Stone, E., and Skubic, M. Evaluation of an inexpensive depth camera for passive in-home fall risk assessment. In Pervasive Health Conference (2011).Google ScholarGoogle ScholarCross RefCross Ref
  31. Turaga, P. K., Chellappa, R., Subrahmanian, V. S., and Udrea, O. Machine recognition of human activities: A survey. IEEE Trans. Circuits Syst. Video Techn 18, 11 (2008), 1473--1488. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Weinland, D., Ronfard, R., and Boyer, E. A survey of vision-based methods for action representation, segmentation and recognition. Tech. rep., INRIA, February 2010.Google ScholarGoogle Scholar
  33. Yao, A., Gall, J., Fanelli, G., and van Gool, L. Does human action recognition benefit from pose estimation? In BMVC (2011).Google ScholarGoogle Scholar

Index Terms

  1. Instructing people for training gestural interactive systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CHI '12: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
      May 2012
      3276 pages
      ISBN:9781450310154
      DOI:10.1145/2207676

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 May 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate6,199of26,314submissions,24%

      Upcoming Conference

      CHI '24
      CHI Conference on Human Factors in Computing Systems
      May 11 - 16, 2024
      Honolulu , HI , USA

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader