skip to main content
10.1145/2502081.2502103acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Online human gesture recognition from motion data streams

Published: 21 October 2013 Publication History

Abstract

Online human gesture recognition has a wide range of applications in computer vision, especially in human-computer interaction applications. Recent introduction of cost-effective depth cameras brings on a new trend of research on body-movement gesture recognition. However, there are two major challenges: i) how to continuously recognize gestures from unsegmented streams, and ii) how to differentiate different styles of a same gesture from other types of gestures. In this paper, we solve these two problems with a new effective and efficient feature extraction method that uses a dynamic matching approach to construct a feature vector for each frame and improves sensitivity to the features of different gestures and decreases sensitivity to the features of gestures within the same class. Our comprehensive experiments on MSRC-12 Kinect Gesture and MSR-Action3D datasets have demonstrated a superior performance than the stat-of-the-art approaches.

Supplementary Material

suppl.mov (mm022.wmv)
Supplemental video

References

[1]
J. K. Aggarwal and M. S. Ryoo. Human activity analysis: A review. ACM Computing Surveys (CSUR), 43(3):16, 2011.
[2]
J. Alon, V. Athitsos, Q. Yuan, and S. Sclaroff. A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 31(9):1685--1699, 2009.
[3]
M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander. Optics: ordering points to identify the clustering structure. ACM SIGMOD International Conference on Management of Data (SIGMOD), 28(2):49--60, 1999.
[4]
D. Berndt and J. Clifford. Using dynamic time warping to find patterns in time series. In KDD workshop, volume 10, pages 359--370, 1994.
[5]
J.M.Chaquet,E.J.Carmona, and A. Fernandez-Caballero. A survey of video datasets for human action and activity recognition. Computer Vision and Image Understanding (CVIU), 117(6):633--659, 2013.
[6]
C. Ellis, S. Masood, M. Tappen, J. LaViola, and R. Sukthankar. Exploring the trade-off between accuracy and observational latency in action recognition. International Journal of Computer Vision (IJCV), 101(3):420--436, 2013.
[7]
L. Fei-Fei and P. Perona. A bayesian hierarchical model for learning natural scene categories. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 524--531, 2005.
[8]
S. Fothergill, H. M. Mentis, P. Kohli, and S. Nowozin. Instructing people for training gestural interactive systems. In ACM annual conference on Human Factors in Computing Systems (CHI), pages 1737--1746, 2012.
[9]
D. Gong, G. Medioni, S. Zhu, and X. Zhao. Kernelized temporal cut for online temporal segmentation and recognition. In European Conference on Computer Vision (ECCV), pages 229--243, 2012.
[10]
T. Guha and R. K. Ward. Learning sparse representations for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34(8):1576--1588, 2012.
[11]
J. Hartigan and M. Wong. A k-means clustering algorithm. JournaloftheRoyalStatisticalSocietyC, 28:100--108, 1979.
[12]
G. Johansson. Visual motion perception. Scientific American, 232(6):76--88, 1975.
[13]
L. Kaufman and P. Rousseeuw. Clustering by means of medoids. Statistical data analysis based on the L1-norm and related methods, pages 405--416, 1987.
[14]
H. Li and M. Greenspan. Model-based segmentation and recognition of dynamic gestures in continuous video streams. Pattern Recognition, 44(8):1614--1628, 2011.
[15]
W. Li, Z. Zhang, and Z. Liu. Action recognition based on a bag of 3d points. In CVPR Workshop, pages 9--14, 2010.
[16]
S.-Y. Lin, C.-K. Shie, S.-C. Chen, and Y.-P. Hung. Action recognition for human-marionette interaction. In ACM international conference on Multimedia (MM), pages 39--48, 2012.
[17]
F. Lv and R. Nevatia. Recognition and segmentation of 3-d human action using hmm and multi-class adaboost. In European Conference on Computer Vision (ECCV), pages 359--372, 2006.
[18]
Z. Ma, Y. Yang, Y. Cai, N. Sebe, and A. G. Hauptmann. Knowledge adaptation for ad hoc multimedia event detection with few exemplars. In ACM international conference on Multimedia (MM), pages 469--478, 2012.
[19]
J. Martens and I. Sutskever. Learning recurrent neural networks with hessian-free optimization. In International Conference on Machine Learning (ICML), pages 1033--1040, 2011.
[20]
M. Muller, A. Baak, and H.-P. Seidel. Efficient and robust annotation of motion capture data. In ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA), pages 17--26, 2009.
[21]
A. Ng, M. Jordan, Y. Weiss, et al. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems (NIPS), pages 849--856, 2002.
[22]
P. Papapetrou, V. Athitsos, M. Potamias, G. Kollios, and D. Gunopulos. Embedding-based subsequence matching in time-series databases. ACM Transactions on Database Systems (TODS), 36(3):17, 2011.
[23]
R. Poppe. A survey on vision-based human action recognition. Image and Vision Computing, 28(6):976--990, 2010.
[24]
M. Ryoo. Human activity prediction: Early recognition of ongoing activities from streaming videos. In IEEE International Conference on Computer Vision (ICCV), pages 1036--1043, 2011.
[25]
Y. Sakurai, C. Faloutsos, and M. Yamamuro. Stream monitoring under the time warping distance. In IEEE International Conference on Data Engineering (ICDE), pages 1046--1055, 2007.
[26]
L. Schwarz, D. Mateus, V. Castaneda, and N. Navab. Manifold learning for tof-based human body tracking and activity recognition. In British Machine Vision Conference (BMVC), pages 1--11, 2010.
[27]
J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake. Real-time human pose recognition in parts from single depth images. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 1297--1304, 2011.
[28]
Y. Song, D. Demirdjian, and R. Davis. Continuous body and hand gesture recognition for natural human-computer interaction. ACM Transactions on Interactive Intelligent Systems (TiiS), 2(1):5, 2012.
[29]
K. Tran, I. Kakadiaris, and S. Shah. Part-based motion descriptor image for human action recognition. Pattern Recognition, 45(7):2562--2572, 2012.
[30]
P. Turaga, R. Chellappa, V. S. Subrahmanian, and O. Udrea. Machine recognition of human activities: A survey. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 18(11):1473--1488, 2008.
[31]
A. Veeraraghavan, R. Chellappa, and A. K. Roy-Chowdhury. The function space of an activity. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), volume 1, pages 959--968, 2006.
[32]
J. Wang, Z. Liu, Y. Wu, and J. Yuan. Mining actionlet ensemble for action recognition with depth cameras. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 1290--1297, 2012.
[33]
S. Wang, Y. Yang, Z. Ma, X. Li, C. Pang, and A. G. Hauptmann. Action recognition by exploring data distribution and feature correlation. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 1370--1377, 2012.
[34]
Y. Yang, Z. Ma, A. G. Hauptmann, and N. Sebe. Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Transactions on Multimedia (TMM), 15(3):661--669, 2013.
[35]
Y. Yang, I. Saleemi, and M. Shah. Discovering motion primitives for unsupervised grouping and one-shot learning of human actions, gestures, and expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(7):1635--1648, 2013.
[36]
Y. Yang, Y. Yang, Z. Huang, H. Shen, and F. Nie. Tag localization with spatial correlations and joint group sparsity. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 881--888, 2011.
[37]
Z. Zhang and D. Tao. Slow feature analysis for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34(3):436--450, 2012.
[38]
F. Zhou, F. Torre, and J. Hodgins. Aligned cluster analysis for temporal segmentation of human motion. In IEEE Conference on Automatic Face and Gesture Recognition (FG), pages 1--7, 2008.

Cited By

View all
  • (2024)Recognition and Scoring Physical Exercises via Temporal and Relative Analysis of Skeleton Nodes Extracted from the Kinect SensorSensors10.3390/s2420671324:20(6713)Online publication date: 18-Oct-2024
  • (2024)Early gesture detection in untrimmed streamsPattern Recognition10.1016/j.patcog.2024.110733156:COnline publication date: 18-Nov-2024
  • (2024)Gesture-Based Machine Learning for Enhanced Autonomous Driving: A Novel Dataset and System Integration ApproachHCI International 2024 Posters10.1007/978-3-031-61963-2_24(247-256)Online publication date: 8-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '13: Proceedings of the 21st ACM international conference on Multimedia
October 2013
1166 pages
ISBN:9781450324045
DOI:10.1145/2502081
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. depth camera
  2. feature extraction
  3. gesture recognition

Qualifiers

  • Research-article

Conference

MM '13
Sponsor:
MM '13: ACM Multimedia Conference
October 21 - 25, 2013
Barcelona, Spain

Acceptance Rates

MM '13 Paper Acceptance Rate 47 of 235 submissions, 20%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)3
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Recognition and Scoring Physical Exercises via Temporal and Relative Analysis of Skeleton Nodes Extracted from the Kinect SensorSensors10.3390/s2420671324:20(6713)Online publication date: 18-Oct-2024
  • (2024)Early gesture detection in untrimmed streamsPattern Recognition10.1016/j.patcog.2024.110733156:COnline publication date: 18-Nov-2024
  • (2024)Gesture-Based Machine Learning for Enhanced Autonomous Driving: A Novel Dataset and System Integration ApproachHCI International 2024 Posters10.1007/978-3-031-61963-2_24(247-256)Online publication date: 8-Jun-2024
  • (2022)A Sliding Window Based Approach With Majority Voting for Online Human Action Recognition using Spatial Temporal Graph Convolutional Neural NetworksProceedings of the 2022 7th International Conference on Machine Learning Technologies10.1145/3529399.3529425(155-163)Online publication date: 11-Mar-2022
  • (2022)Fine-Grained Unsupervised Temporal Action Segmentation and Distributed Representation for Skeleton-Based Human Motion AnalysisIEEE Transactions on Cybernetics10.1109/TCYB.2021.313201652:12(13411-13424)Online publication date: Dec-2022
  • (2022)Online human action detection and anticipation in videosNeurocomputing10.1016/j.neucom.2022.03.069491:C(395-413)Online publication date: 28-Jun-2022
  • (2021)Content-Based Management of Human Motion Data: Survey and ChallengesIEEE Access10.1109/ACCESS.2021.30757669(64241-64255)Online publication date: 2021
  • (2020)Temporal Hierarchical Dictionary Guided Decoding for Online Gesture Segmentation and RecognitionIEEE Transactions on Image Processing10.1109/TIP.2020.302896229(9689-9702)Online publication date: 2020
  • (2020)Image-based Pose Representation for Action Recognition and Hand Gesture Recognition2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)10.1109/FG47880.2020.00066(532-539)Online publication date: Nov-2020
  • (2020)Oops! Predicting Unintentional Action in Video2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR42600.2020.00100(916-926)Online publication date: Jun-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media