ABSTRACT
Entertainment and gaming systems such as the Wii and XBox Kinect have brought touchless, body-movement based interfaces to the masses. Systems like these enable the estimation of movements of various body parts from raw inertial motion or depth sensor data. However, the interface developer is still left with the challenging task of creating a system that recognizes these movements as embodying meaning. The machine learning approach for tackling this problem requires the collection of data sets that contain the relevant body movements and their associated semantic labels. These data sets directly impact the accuracy and performance of the gesture recognition system and should ideally contain all natural variations of the movements associated with a gesture. This paper addresses the problem of collecting such gesture datasets. In particular, we investigate the question of what is the most appropriate semiotic modality of instructions for conveying to human subjects the movements the system developer needs them to perform. The results of our qualitative and quantitative analysis indicate that the choice of modality has a significant impact on the performance of the learnt gesture recognition system; particularly in terms of correctness and coverage.
- Chalearn gesture dataset (cgd2011), chalearn, california, 2011.Google Scholar
- Aggarwal, J., and Ryoo, M. Human activity analysis: A review. ACM Computing Surveys (2011). To appear. Google ScholarDigital Library
- Breiman, L. Random forests. Machine Learning 45, 1 (2001). Google ScholarDigital Library
- Bruner, J. Toward a theory of instruction. Belknap Press of Harvard University Press, 1966.Google Scholar
- Charbonneau, E., Miller, A., and LaViola, J. Teach me to dance: Exploring player experience and performance in full body dance games.Google Scholar
- Fothergill, S., Harle, R., and Holden, S. Modelling the model athlete : Automatic coaching of rowing technique. In Structural, Syntactic, and Statistical Pattern Recognition, vol. 5342 of LNCS (2008), 372--381. Google ScholarDigital Library
- Furui, S., Nakamura, M., Ichiba, T., and Iwano, K. Why is the recognition of spontaneous speech so hard? In Text, Speech and Dialogue, V. Matouek, P. Mautner, and T. Pavelka, Eds., vol. 3658 of Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2005, 747--747. Google ScholarDigital Library
- Gorelick, L., Blank, M., Shechtman, E., Irani, M., and Basri, R. Actions as space-time shapes. Transactions on Pattern Analysis and Machine Intelligence 29, 12 (December 2007), 2247--2253. Google ScholarDigital Library
- Guest, A. H. Labanotation, or, Kinetography Laban: The System of Analyzing and Recording Movements. Dance Books, 1996.Google Scholar
- Hwang, B.-W., K. S., and Lee, S.-W. A full-body gesture database for automatic gesture recognition. In Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition, FGR '06, IEEE Computer Society) (2006), 243--248. Google ScholarDigital Library
- Kress, and van Leeuwen. Reading Images: Grammar of Visual Design. Routledge, 1996.Google Scholar
- Kuehne, H., J. H. G. E. P. T., and Serre, T. HMDB: a large video database for human motion recognition. In Proceedings of the International Conference on Computer Vision (ICCV) (2011). Google ScholarDigital Library
- Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld, B. Learning realistic human actions from movies. In CVPR, IEEE Computer Society (2008).Google Scholar
- Lin, Z., Jiang, Z., and Davis, L. S. Recognizing actions by shape-motion prototype trees. In ICCV, IEEE (2009), 444--451.Google Scholar
- Liu, J. G., Luo, J. B., and Shah, M. Recognizing realistic actions from videos 'in the wild'. In CVPR (2009), 1996--2003.Google Scholar
- Marszałek, M., Laptev, I., and Schmid, C. Actions in context. In CVPR, IEEE (2009), 2929--2936.Google Scholar
- McNeil, D. Hand and Mind, What Gestures Reveal about Thought. The University of Chicago Press, 1992.Google Scholar
- Nowozin, S., and Shotton, J. Action points: A representation for low-latency online human action recognition.Google Scholar
- Nunnally, J. C., and Bernstein, I. H. Psychometric Theory. McGraw-Hill, 1994.Google Scholar
- Oh, S., Hoogs, A., Perera, A., Cuntoor, N., Chen, C.-C., Lee, J. T., Mukherjee, S., and et al. A large-scale benchmark dataset for event recognition in surveillance video. In CVPR (2011).Google Scholar
- Padmanabhan, M., Ramaswamy, G., Ramabhadran, B., Gopalakrishnan, P. S., and Dunn, C. Issues involved in voicemail data collection. In DARPA Hub 4 Workshop (1998).Google Scholar
- Peirce, C. On a new list of categories. Proceedings of the American Academy of Arts and Sciences (1867).Google Scholar
- Poppe, R. A survey on vision-based human action recognition. Image and Vision Computing 28, 6 (2010), 976--990. Google ScholarDigital Library
- Quinn, D. Personal communication with David Quinn (RARE, UK), August 2011.Google Scholar
- Rijsbergen, C. J. V. Information Retrieval. Butterworths, 1979. Google ScholarDigital Library
- Rodriguez, M. D., Ahmed, J., and Shah, M. Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In CVPR, IEEE Computer Society (2008).Google Scholar
- Schindler, K., and Gool, L. J. V. Action snippets: How many frames does human action recognition require? In CVPR, IEEE Computer Society (2008).Google Scholar
- Schüldt, C., Laptev, I., and Caputo, B. Recognizing human actions: A local SVM approach. In ICPR (2004), 32--36. Google ScholarDigital Library
- Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. Real-time human pose recognition in parts from a single depth image. In CVPR (2011). Google ScholarDigital Library
- Stone, E., and Skubic, M. Evaluation of an inexpensive depth camera for passive in-home fall risk assessment. In Pervasive Health Conference (2011).Google ScholarCross Ref
- Turaga, P. K., Chellappa, R., Subrahmanian, V. S., and Udrea, O. Machine recognition of human activities: A survey. IEEE Trans. Circuits Syst. Video Techn 18, 11 (2008), 1473--1488. Google ScholarDigital Library
- Weinland, D., Ronfard, R., and Boyer, E. A survey of vision-based methods for action representation, segmentation and recognition. Tech. rep., INRIA, February 2010.Google Scholar
- Yao, A., Gall, J., Fanelli, G., and van Gool, L. Does human action recognition benefit from pose estimation? In BMVC (2011).Google Scholar
Index Terms
- Instructing people for training gestural interactive systems
Recommendations
Styling Words: A Simple and Natural Way to Increase Variability in Training Data Collection for Gesture Recognition
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing SystemsDue to advances in deep learning, gestures have become a more common tool for human-computer interaction. When implementing a large amount of training data, deep learning models show remarkable performance in gesture recognition. Since it is expensive ...
Wordometer Systems for Everyday Life
We present in this paper a detailed comparison of different algorithms and devices to determine the number of words read in everyday life. We call our system the “Wordometer”. We used three kinds of eye tracking systems in our experiment: mobile video-...
Gestural Technology: Moving Interfaces in a New Direction
Gesture-based interfaces--which let users control devices with,for example, hand or finger motions--are becoming increasingly popular.
Comments