research-article

Online human gesture recognition from motion data streams

Authors:

Quan Z. ShengAuthors Info & Claims

MM '13: Proceedings of the 21st ACM international conference on Multimedia

Pages 23 - 32

https://doi.org/10.1145/2502081.2502103

Published: 21 October 2013 Publication History

Abstract

Online human gesture recognition has a wide range of applications in computer vision, especially in human-computer interaction applications. Recent introduction of cost-effective depth cameras brings on a new trend of research on body-movement gesture recognition. However, there are two major challenges: i) how to continuously recognize gestures from unsegmented streams, and ii) how to differentiate different styles of a same gesture from other types of gestures. In this paper, we solve these two problems with a new effective and efficient feature extraction method that uses a dynamic matching approach to construct a feature vector for each frame and improves sensitivity to the features of different gestures and decreases sensitivity to the features of gestures within the same class. Our comprehensive experiments on MSRC-12 Kinect Gesture and MSR-Action3D datasets have demonstrated a superior performance than the stat-of-the-art approaches.

Supplementary Material

suppl.mov (mm022.wmv)

Supplemental video

Download
9.45 MB

References

[1]

J. K. Aggarwal and M. S. Ryoo. Human activity analysis: A review. ACM Computing Surveys (CSUR), 43(3):16, 2011.

Digital Library

[2]

J. Alon, V. Athitsos, Q. Yuan, and S. Sclaroff. A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 31(9):1685--1699, 2009.

Digital Library

[3]

M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander. Optics: ordering points to identify the clustering structure. ACM SIGMOD International Conference on Management of Data (SIGMOD), 28(2):49--60, 1999.

Digital Library

[4]

D. Berndt and J. Clifford. Using dynamic time warping to find patterns in time series. In KDD workshop, volume 10, pages 359--370, 1994.

Digital Library

[5]

J.M.Chaquet,E.J.Carmona, and A. Fernandez-Caballero. A survey of video datasets for human action and activity recognition. Computer Vision and Image Understanding (CVIU), 117(6):633--659, 2013.

Digital Library

[6]

C. Ellis, S. Masood, M. Tappen, J. LaViola, and R. Sukthankar. Exploring the trade-off between accuracy and observational latency in action recognition. International Journal of Computer Vision (IJCV), 101(3):420--436, 2013.

Digital Library

[7]

L. Fei-Fei and P. Perona. A bayesian hierarchical model for learning natural scene categories. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 524--531, 2005.

Digital Library

[8]

S. Fothergill, H. M. Mentis, P. Kohli, and S. Nowozin. Instructing people for training gestural interactive systems. In ACM annual conference on Human Factors in Computing Systems (CHI), pages 1737--1746, 2012.

Digital Library

[9]

D. Gong, G. Medioni, S. Zhu, and X. Zhao. Kernelized temporal cut for online temporal segmentation and recognition. In European Conference on Computer Vision (ECCV), pages 229--243, 2012.

Digital Library

[10]

T. Guha and R. K. Ward. Learning sparse representations for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34(8):1576--1588, 2012.

Digital Library

[11]

J. Hartigan and M. Wong. A k-means clustering algorithm. JournaloftheRoyalStatisticalSocietyC, 28:100--108, 1979.

[12]

G. Johansson. Visual motion perception. Scientific American, 232(6):76--88, 1975.

[13]

L. Kaufman and P. Rousseeuw. Clustering by means of medoids. Statistical data analysis based on the L1-norm and related methods, pages 405--416, 1987.

[14]

H. Li and M. Greenspan. Model-based segmentation and recognition of dynamic gestures in continuous video streams. Pattern Recognition, 44(8):1614--1628, 2011.

Digital Library

[15]

W. Li, Z. Zhang, and Z. Liu. Action recognition based on a bag of 3d points. In CVPR Workshop, pages 9--14, 2010.

[16]

S.-Y. Lin, C.-K. Shie, S.-C. Chen, and Y.-P. Hung. Action recognition for human-marionette interaction. In ACM international conference on Multimedia (MM), pages 39--48, 2012.

Digital Library

[17]

F. Lv and R. Nevatia. Recognition and segmentation of 3-d human action using hmm and multi-class adaboost. In European Conference on Computer Vision (ECCV), pages 359--372, 2006.

Digital Library

[18]

Z. Ma, Y. Yang, Y. Cai, N. Sebe, and A. G. Hauptmann. Knowledge adaptation for ad hoc multimedia event detection with few exemplars. In ACM international conference on Multimedia (MM), pages 469--478, 2012.

Digital Library

[19]

J. Martens and I. Sutskever. Learning recurrent neural networks with hessian-free optimization. In International Conference on Machine Learning (ICML), pages 1033--1040, 2011.

[20]

M. Muller, A. Baak, and H.-P. Seidel. Efficient and robust annotation of motion capture data. In ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA), pages 17--26, 2009.

Digital Library

[21]

A. Ng, M. Jordan, Y. Weiss, et al. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems (NIPS), pages 849--856, 2002.

Digital Library

[22]

P. Papapetrou, V. Athitsos, M. Potamias, G. Kollios, and D. Gunopulos. Embedding-based subsequence matching in time-series databases. ACM Transactions on Database Systems (TODS), 36(3):17, 2011.

Digital Library

[23]

R. Poppe. A survey on vision-based human action recognition. Image and Vision Computing, 28(6):976--990, 2010.

Digital Library

[24]

M. Ryoo. Human activity prediction: Early recognition of ongoing activities from streaming videos. In IEEE International Conference on Computer Vision (ICCV), pages 1036--1043, 2011.

Digital Library

[25]

Y. Sakurai, C. Faloutsos, and M. Yamamuro. Stream monitoring under the time warping distance. In IEEE International Conference on Data Engineering (ICDE), pages 1046--1055, 2007.

[26]

L. Schwarz, D. Mateus, V. Castaneda, and N. Navab. Manifold learning for tof-based human body tracking and activity recognition. In British Machine Vision Conference (BMVC), pages 1--11, 2010.

[27]

J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake. Real-time human pose recognition in parts from single depth images. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 1297--1304, 2011.

Digital Library

[28]

Y. Song, D. Demirdjian, and R. Davis. Continuous body and hand gesture recognition for natural human-computer interaction. ACM Transactions on Interactive Intelligent Systems (TiiS), 2(1):5, 2012.

Digital Library

[29]

K. Tran, I. Kakadiaris, and S. Shah. Part-based motion descriptor image for human action recognition. Pattern Recognition, 45(7):2562--2572, 2012.

Digital Library

[30]

P. Turaga, R. Chellappa, V. S. Subrahmanian, and O. Udrea. Machine recognition of human activities: A survey. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 18(11):1473--1488, 2008.

Digital Library

[31]

A. Veeraraghavan, R. Chellappa, and A. K. Roy-Chowdhury. The function space of an activity. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), volume 1, pages 959--968, 2006.

Digital Library

[32]

J. Wang, Z. Liu, Y. Wu, and J. Yuan. Mining actionlet ensemble for action recognition with depth cameras. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 1290--1297, 2012.

Digital Library

[33]

S. Wang, Y. Yang, Z. Ma, X. Li, C. Pang, and A. G. Hauptmann. Action recognition by exploring data distribution and feature correlation. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 1370--1377, 2012.

Digital Library

[34]

Y. Yang, Z. Ma, A. G. Hauptmann, and N. Sebe. Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Transactions on Multimedia (TMM), 15(3):661--669, 2013.

Digital Library

[35]

Y. Yang, I. Saleemi, and M. Shah. Discovering motion primitives for unsupervised grouping and one-shot learning of human actions, gestures, and expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(7):1635--1648, 2013.

Digital Library

[36]

Y. Yang, Y. Yang, Z. Huang, H. Shen, and F. Nie. Tag localization with spatial correlations and joint group sparsity. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 881--888, 2011.

Digital Library

[37]

Z. Zhang and D. Tao. Slow feature analysis for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34(3):436--450, 2012.

Digital Library

[38]

F. Zhou, F. Torre, and J. Hodgins. Aligned cluster analysis for temporal segmentation of human motion. In IEEE Conference on Automatic Face and Gesture Recognition (FG), pages 1--7, 2008.

Cited By

Esmaeeli RValadan Zoej MSafdarinezhad AGhaderpour E(2024)Recognition and Scoring Physical Exercises via Temporal and Relative Analysis of Skeleton Nodes Extracted from the Kinect SensorSensors10.3390/s2420671324:20(6713)Online publication date: 18-Oct-2024
https://doi.org/10.3390/s24206713
Mocaër WAnquetil EKulpa R(2024)Early gesture detection in untrimmed streamsPattern Recognition10.1016/j.patcog.2024.110733156:COnline publication date: 18-Nov-2024
https://dl.acm.org/doi/10.1016/j.patcog.2024.110733
Milde SFriesen SRunzheimer TBeilstein CBlum RMilde J(2024)Gesture-Based Machine Learning for Enhanced Autonomous Driving: A Novel Dataset and System Integration ApproachHCI International 2024 Posters10.1007/978-3-031-61963-2_24(247-256)Online publication date: 8-Jun-2024
https://doi.org/10.1007/978-3-031-61963-2_24
Show More Cited By

Index Terms

Online human gesture recognition from motion data streams
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Image and video acquisition
        Motion capture
  2. Computer graphics
    1. Animation
      1. Motion capture
      2. Motion processing
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interactive systems and tools

Recommendations

Structured Streaming Skeleton -- A New Feature for Online Human Gesture Recognition
Special Issue on Multiple Sensorial (MulSeMedia) Multimodal Media : Advances and Applications

Online human gesture recognition has a wide range of applications in computer vision, especially in human-computer interaction applications. The recent introduction of cost-effective depth cameras brings a new trend of research on body-movement gesture ...
Multi-scenario gesture recognition using Kinect
CGAMES '12: Proceedings of the 2012 17th International Conference on Computer Games: AI, Animation, Mobile, Interactive Multimedia, Educational & Serious Games (CGAMES)

Hand gesture recognition (HGR) is an important research topic because some situations require silent communication with sign languages. Computational HGR systems assist silent communication, and help people learn a sign language. In this article, a ...
A Method for Hand Gesture Recognition
CSNT '14: Proceedings of the 2014 Fourth International Conference on Communication Systems and Network Technologies

In this paper, we present a method for hand gesture recognition using Microsoft Kinect sensor. Kinect allows capturing dense, and three dimensional scans of an object in real time. We propose a combination of modelling and learning approach for hand ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '13: Proceedings of the 21st ACM international conference on Multimedia

October 2013

1166 pages

ISBN:9781450324045

DOI:10.1145/2502081

General Chairs:
Alejandro (Alex) Jaimes
Yahoo!, Spain
,
Nicu Sebe
University of Trento, Italy
,
Nozha Boujemaa
INRIA, France
,
Program Chairs:
Daniel Gatica-Perez
IDIAP & EPFL, Switzerland
,
David A. Shamma
Yahoo!, USA
,
Marcel Worring
University of Amsterdam, The Netherlands
,
Roger Zimmermann
National University of Singapore, Singapore

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '13

Sponsor:

SIGMM

MM '13: ACM Multimedia Conference

October 21 - 25, 2013

Barcelona, Spain

Acceptance Rates

MM '13 Paper Acceptance Rate 47 of 235 submissions, 20%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

57
Total Citations
View Citations
1,304
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)3

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Esmaeeli RValadan Zoej MSafdarinezhad AGhaderpour E(2024)Recognition and Scoring Physical Exercises via Temporal and Relative Analysis of Skeleton Nodes Extracted from the Kinect SensorSensors10.3390/s2420671324:20(6713)Online publication date: 18-Oct-2024
https://doi.org/10.3390/s24206713
Mocaër WAnquetil EKulpa R(2024)Early gesture detection in untrimmed streamsPattern Recognition10.1016/j.patcog.2024.110733156:COnline publication date: 18-Nov-2024
https://dl.acm.org/doi/10.1016/j.patcog.2024.110733
Milde SFriesen SRunzheimer TBeilstein CBlum RMilde J(2024)Gesture-Based Machine Learning for Enhanced Autonomous Driving: A Novel Dataset and System Integration ApproachHCI International 2024 Posters10.1007/978-3-031-61963-2_24(247-256)Online publication date: 8-Jun-2024
https://doi.org/10.1007/978-3-031-61963-2_24
Dallel MHavard VDupuis YBaudry D(2022)A Sliding Window Based Approach With Majority Voting for Online Human Action Recognition using Spatial Temporal Graph Convolutional Neural NetworksProceedings of the 2022 7th International Conference on Machine Learning Technologies10.1145/3529399.3529425(155-163)Online publication date: 11-Mar-2022
https://dl.acm.org/doi/10.1145/3529399.3529425
Ma HYang ZLiu H(2022)Fine-Grained Unsupervised Temporal Action Segmentation and Distributed Representation for Skeleton-Based Human Motion AnalysisIEEE Transactions on Cybernetics10.1109/TCYB.2021.313201652:12(13411-13424)Online publication date: Dec-2022
https://doi.org/10.1109/TCYB.2021.3132016
Hu XDai JLi MPeng CLi YDu S(2022)Online human action detection and anticipation in videosNeurocomputing10.1016/j.neucom.2022.03.069491:C(395-413)Online publication date: 28-Jun-2022
https://dl.acm.org/doi/10.1016/j.neucom.2022.03.069
Sedmidubsky JElias PBudikova PZezula P(2021)Content-Based Management of Human Motion Data: Survey and ChallengesIEEE Access10.1109/ACCESS.2021.30757669(64241-64255)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3075766
Chen HLiu XShi JZhao G(2020)Temporal Hierarchical Dictionary Guided Decoding for Online Gesture Segmentation and RecognitionIEEE Transactions on Image Processing10.1109/TIP.2020.302896229(9689-9702)Online publication date: 2020
https://doi.org/10.1109/TIP.2020.3028962
Lin ZZhang WDeng XMa CWang H(2020)Image-based Pose Representation for Action Recognition and Hand Gesture Recognition2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)10.1109/FG47880.2020.00066(532-539)Online publication date: Nov-2020
https://doi.org/10.1109/FG47880.2020.00066
Epstein DChen BVondrick C(2020)Oops! Predicting Unintentional Action in Video2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR42600.2020.00100(916-926)Online publication date: Jun-2020
https://doi.org/10.1109/CVPR42600.2020.00100
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten