research-article

Instructing people for training gestural interactive systems

Authors:

Simon Fothergill,

Pushmeet Kohli,

Sebastian NowozinAuthors Info & Claims

CHI '12: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Pages 1737 - 1746

https://doi.org/10.1145/2207676.2208303

Published: 05 May 2012 Publication History

Abstract

Entertainment and gaming systems such as the Wii and XBox Kinect have brought touchless, body-movement based interfaces to the masses. Systems like these enable the estimation of movements of various body parts from raw inertial motion or depth sensor data. However, the interface developer is still left with the challenging task of creating a system that recognizes these movements as embodying meaning. The machine learning approach for tackling this problem requires the collection of data sets that contain the relevant body movements and their associated semantic labels. These data sets directly impact the accuracy and performance of the gesture recognition system and should ideally contain all natural variations of the movements associated with a gesture. This paper addresses the problem of collecting such gesture datasets. In particular, we investigate the question of what is the most appropriate semiotic modality of instructions for conveying to human subjects the movements the system developer needs them to perform. The results of our qualitative and quantitative analysis indicate that the choice of modality has a significant impact on the performance of the learnt gesture recognition system; particularly in terms of correctness and coverage.

References

[1]

Chalearn gesture dataset (cgd2011), chalearn, california, 2011.

[2]

Aggarwal, J., and Ryoo, M. Human activity analysis: A review. ACM Computing Surveys (2011). To appear.

Digital Library

[3]

Breiman, L. Random forests. Machine Learning 45, 1 (2001).

Digital Library

[4]

Bruner, J. Toward a theory of instruction. Belknap Press of Harvard University Press, 1966.

[5]

Charbonneau, E., Miller, A., and LaViola, J. Teach me to dance: Exploring player experience and performance in full body dance games.

[6]

Fothergill, S., Harle, R., and Holden, S. Modelling the model athlete : Automatic coaching of rowing technique. In Structural, Syntactic, and Statistical Pattern Recognition, vol. 5342 of LNCS (2008), 372--381.

Digital Library

[7]

Furui, S., Nakamura, M., Ichiba, T., and Iwano, K. Why is the recognition of spontaneous speech so hard? In Text, Speech and Dialogue, V. Matouek, P. Mautner, and T. Pavelka, Eds., vol. 3658 of Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2005, 747--747.

Digital Library

[8]

Gorelick, L., Blank, M., Shechtman, E., Irani, M., and Basri, R. Actions as space-time shapes. Transactions on Pattern Analysis and Machine Intelligence 29, 12 (December 2007), 2247--2253.

Digital Library

[9]

Guest, A. H. Labanotation, or, Kinetography Laban: The System of Analyzing and Recording Movements. Dance Books, 1996.

[10]

Hwang, B.-W., K. S., and Lee, S.-W. A full-body gesture database for automatic gesture recognition. In Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition, FGR '06, IEEE Computer Society) (2006), 243--248.

Digital Library

[11]

Kress, and van Leeuwen. Reading Images: Grammar of Visual Design. Routledge, 1996.

[12]

Kuehne, H., J. H. G. E. P. T., and Serre, T. HMDB: a large video database for human motion recognition. In Proceedings of the International Conference on Computer Vision (ICCV) (2011).

Digital Library

[13]

Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld, B. Learning realistic human actions from movies. In CVPR, IEEE Computer Society (2008).

[14]

Lin, Z., Jiang, Z., and Davis, L. S. Recognizing actions by shape-motion prototype trees. In ICCV, IEEE (2009), 444--451.

[15]

Liu, J. G., Luo, J. B., and Shah, M. Recognizing realistic actions from videos 'in the wild'. In CVPR (2009), 1996--2003.

[16]

Marszałek, M., Laptev, I., and Schmid, C. Actions in context. In CVPR, IEEE (2009), 2929--2936.

[17]

McNeil, D. Hand and Mind, What Gestures Reveal about Thought. The University of Chicago Press, 1992.

[18]

Nowozin, S., and Shotton, J. Action points: A representation for low-latency online human action recognition.

[19]

Nunnally, J. C., and Bernstein, I. H. Psychometric Theory. McGraw-Hill, 1994.

[20]

Oh, S., Hoogs, A., Perera, A., Cuntoor, N., Chen, C.-C., Lee, J. T., Mukherjee, S., and et al. A large-scale benchmark dataset for event recognition in surveillance video. In CVPR (2011).

[21]

Padmanabhan, M., Ramaswamy, G., Ramabhadran, B., Gopalakrishnan, P. S., and Dunn, C. Issues involved in voicemail data collection. In DARPA Hub 4 Workshop (1998).

[22]

Peirce, C. On a new list of categories. Proceedings of the American Academy of Arts and Sciences (1867).

[23]

Poppe, R. A survey on vision-based human action recognition. Image and Vision Computing 28, 6 (2010), 976--990.

Digital Library

[24]

Quinn, D. Personal communication with David Quinn (RARE, UK), August 2011.

[25]

Rijsbergen, C. J. V. Information Retrieval. Butterworths, 1979.

Digital Library

[26]

Rodriguez, M. D., Ahmed, J., and Shah, M. Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In CVPR, IEEE Computer Society (2008).

[27]

Schindler, K., and Gool, L. J. V. Action snippets: How many frames does human action recognition require? In CVPR, IEEE Computer Society (2008).

[28]

Schüldt, C., Laptev, I., and Caputo, B. Recognizing human actions: A local SVM approach. In ICPR (2004), 32--36.

Digital Library

[29]

Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. Real-time human pose recognition in parts from a single depth image. In CVPR (2011).

Digital Library

[30]

Stone, E., and Skubic, M. Evaluation of an inexpensive depth camera for passive in-home fall risk assessment. In Pervasive Health Conference (2011).

[31]

Turaga, P. K., Chellappa, R., Subrahmanian, V. S., and Udrea, O. Machine recognition of human activities: A survey. IEEE Trans. Circuits Syst. Video Techn 18, 11 (2008), 1473--1488.

Digital Library

[32]

Weinland, D., Ronfard, R., and Boyer, E. A survey of vision-based methods for action representation, segmentation and recognition. Tech. rep., INRIA, February 2010.

[33]

Yao, A., Gall, J., Fanelli, G., and van Gool, L. Does human action recognition benefit from pose estimation? In BMVC (2011).

Cited By

Singh UAbhishek KAzad H(2024)A Survey of Cutting-edge Multimodal Sentiment AnalysisACM Computing Surveys10.1145/365214956:9(1-38)Online publication date: 25-Apr-2024
https://dl.acm.org/doi/10.1145/3652149
B KMK NBaza MBadr MDeora Y(2024)IoT-Enhanced Gesture Recognition System for Healthcare Applications2024 International Telecommunications Conference (ITC-Egypt)10.1109/ITC-Egypt61547.2024.10620584(184-189)Online publication date: 22-Jul-2024
https://doi.org/10.1109/ITC-Egypt61547.2024.10620584
Wu JYang T(2024)A Brief Review of Sign Language Recognition Methods and Cutting-edge Technologies2024 5th International Conference on Computer Engineering and Application (ICCEA)10.1109/ICCEA62105.2024.10603746(1233-1242)Online publication date: 12-Apr-2024
https://doi.org/10.1109/ICCEA62105.2024.10603746
Show More Cited By

Index Terms

Instructing people for training gestural interactive systems
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Styling Words: A Simple and Natural Way to Increase Variability in Training Data Collection for Gesture Recognition
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

Due to advances in deep learning, gestures have become a more common tool for human-computer interaction. When implementing a large amount of training data, deep learning models show remarkable performance in gesture recognition. Since it is expensive ...
Collecting and Analyzing the Mid-Air Gestures Data in Augmented Reality and User Preferences in Closed Elicitation Study
Virtual, Augmented and Mixed Reality
Abstract
For users of AR-HMDs (i.e., AR glasses), one option for interaction with the HMD is through midair gestures. This modality allows users to move their hands to grab objects, move objects, select menus, and interact with a multiple set of features ...
Wordometer Systems for Everyday Life

We present in this paper a detailed comparison of different algorithms and devices to determine the number of words read in everyday life. We call our system the “Wordometer”. We used three kinds of eye tracking systems in our experiment: mobile video-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CHI '12: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

May 2012

3276 pages

ISBN:9781450310154

DOI:10.1145/2207676

General Chair:
Joseph A. Konstan
University of Minnesota
,
Program Chairs:
Ed H. Chi
Google
,
Kristina Höök
Mobile Life at KTH

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 May 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CHI '12

Sponsor:

SIGCHI

CHI '12: CHI Conference on Human Factors in Computing Systems

May 5 - 10, 2012

Texas, Austin, USA

Acceptance Rates

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI 2025

Sponsor:
sigchi

ACM CHI Conference on Human Factors in Computing Systems

April 26 - May 1, 2025

Yokohama , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

252
Total Citations
View Citations
1,759
Total Downloads

Downloads (Last 12 months)60
Downloads (Last 6 weeks)3

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Singh UAbhishek KAzad H(2024)A Survey of Cutting-edge Multimodal Sentiment AnalysisACM Computing Surveys10.1145/365214956:9(1-38)Online publication date: 25-Apr-2024
https://dl.acm.org/doi/10.1145/3652149
B KMK NBaza MBadr MDeora Y(2024)IoT-Enhanced Gesture Recognition System for Healthcare Applications2024 International Telecommunications Conference (ITC-Egypt)10.1109/ITC-Egypt61547.2024.10620584(184-189)Online publication date: 22-Jul-2024
https://doi.org/10.1109/ITC-Egypt61547.2024.10620584
Wu JYang T(2024)A Brief Review of Sign Language Recognition Methods and Cutting-edge Technologies2024 5th International Conference on Computer Engineering and Application (ICCEA)10.1109/ICCEA62105.2024.10603746(1233-1242)Online publication date: 12-Apr-2024
https://doi.org/10.1109/ICCEA62105.2024.10603746
Raj MGeorge SRaja K(2024)Leveraging spatio-temporal features using graph neural networks for human activity recognitionPattern Recognition10.1016/j.patcog.2024.110301150(110301)Online publication date: Jun-2024
https://doi.org/10.1016/j.patcog.2024.110301
Miao QLi YLiu XLiu R(2024)Common datasets in the field of gesture recognitionGesture Recognition10.1016/B978-0-443-28959-0.00008-X(17-33)Online publication date: 2024
https://doi.org/10.1016/B978-0-443-28959-0.00008-X
Liu ECasey WMelaragno A(2024)High Dimensional Computing Approach to Detection and Learning Gesture BiometricsIntelligent Computing10.1007/978-3-031-62273-1_35(551-565)Online publication date: 15-Jun-2024
https://doi.org/10.1007/978-3-031-62273-1_35
Villarreal-Narvaez SSluÿters AVanderdonckt JVatavu R(2023)Brave New GES World: A Systematic Literature Review of Gestures and Referents in Gesture Elicitation StudiesACM Computing Surveys10.1145/363645856:5(1-55)Online publication date: 7-Dec-2023
https://dl.acm.org/doi/10.1145/3636458
Yang ZChen S(2023)MOST: Model-Based Compression with Outlier Storage for Time Series DataProceedings of the ACM on Management of Data10.1145/36267371:4(1-29)Online publication date: 12-Dec-2023
https://dl.acm.org/doi/10.1145/3626737
Zhou LLiu ZZhang WFan DQian X(2023)Fusing Skeletons and Texts Based on GCN-CNN for Action RecognitionProceedings of the 15th International Conference on Digital Image Processing10.1145/3604078.3604087(1-6)Online publication date: 19-May-2023
https://dl.acm.org/doi/10.1145/3604078.3604087
Vatavu R(2023)iFAD Gestures: Understanding Users’ Gesture Input Performance with Index-Finger Augmentation DevicesProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580928(1-17)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3580928
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten