skip to main content
research-article
Public Access

A Temporal Order Modeling Approach to Human Action Recognition from Multimodal Sensor Data

Authors Info & Claims
Published:06 March 2017Publication History
Skip Abstract Section

Abstract

From wearable devices to depth cameras, researchers have exploited various multimodal data to recognize human actions for applications, such as video gaming, education, and healthcare. Although there many successful techniques have been presented in the literature, most current approaches have focused on statistical or local spatiotemporal features and do not explicitly explore the temporal dynamics of the sensor data. However, human action data contain rich temporal structure information that can characterize the unique underlying patterns of different action categories. From this perspective, we propose a novel temporal order modeling approach to human action recognition. Specifically, we explore subspace projections to extract the latent temporal patterns from different human action sequences. The temporal order between these patterns are compared, and the index of the pattern that appears first is used to encode the entire sequence. This process is repeated multiple times and produces a compact feature vector representing the temporal dynamics of the sequence. Human action recognition can then be efficiently solved by the nearest neighbor search based on the Hamming distance between these compact feature vectors. We further introduce a sequential optimization algorithm to learn the optimized projections that preserve the pairwise label similarity of the action sequences. Experimental results on two public human action datasets demonstrate the superior performance of the proposed technique in both accuracy and efficiency.

References

  1. Kerem Altun, Billur Barshan, and Orkun Tunçel. 2010. Comparative study on classifying human activities with miniature inertial and magnetic sensors. Pattern Recognition 43, 10, 3605--3620. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Billur Barshan and Murat Cihan Yüksek. 2014. Recognizing daily and sports activities in two open source machine learning environments using body-worn sensor units. Computer Journal 57, 11, 1649--1667. Google ScholarGoogle ScholarCross RefCross Ref
  3. Aaron F. Bobick and James W. Davis. 2001. The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 3, 257--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Sander Dieleman, Jan Schlüter, Colin Raffel, Eben Olson, Søren Kaae Sønderby, Daniel Nouri, Daniel Maturana, et al. 2015. Lasagne: First Release. Zenodo, Geneva, Switzerland.Google ScholarGoogle Scholar
  5. Piotr Dollár, Vincent Rabaud, Garrison Cottrell, and Serge Belongie. 2005. Behavior recognition via sparse spatio-temporal features. In Proceedings of the 2005 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance. 65--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1110--1118.Google ScholarGoogle Scholar
  7. Yoav Freund, Robert Schapire, and Naoki Abe. 1999. A short introduction to boosting. Journal of the Japanese Society for Artificial Intelligence 14, 5, 771--780.Google ScholarGoogle Scholar
  8. Raj Gupta, Alex Yong-Sang Chia, and Deepu Rajan. 2013. Human activities recognition using depth images. In Proceedings of the 21st ACM International Conference on Multimedia. 283--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lei Han, Xinxiao Wu, Wei Liang, Guangming Hou, and Yunde Jia. 2010. Discriminative human action recognition in the learned hierarchical manifold space. Image and Vision Computing 28, 5, 836--849. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Zhenyu He and Lianwen Jin. 2009. Activity recognition from acceleration data based on discrete consine transform and SVM. In Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics (SMC’09). IEEE, Los Alamitos, CA, 5041--5044. Google ScholarGoogle ScholarCross RefCross Ref
  11. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8, 1735--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Anwar Hossain, Pradeep K. Atrey, and Abdulmotaleb El Saddik. 2011. Modeling and assessing quality of information in multisensor multimedia monitoring systems. ACM Transactions on Multimedia Computing, Communications, and Applications 7, 1, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jian-Fang Hu, Wei-Shi Zheng, Jianhuang Lai, and Jianguo Zhang. 2015. Jointly learning heterogeneous features for RGB-D activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5344--5352. Google ScholarGoogle ScholarCross RefCross Ref
  14. Ivan Laptev. 2005. On space-time interest points. International Journal of Computer Vision 64, 2--3, 107--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Oscar D. Lara and Miguel A. Labrador. 2013. A survey on human activity recognition using wearable sensors. IEEE Communications Surveys and Tutorials 15, 3, 1192--1209. Google ScholarGoogle ScholarCross RefCross Ref
  16. Wanqing Li, Zhengyou Zhang, and Zicheng Liu. 2010. Action recognition based on a bag of 3D points. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’10). 9--14. Google ScholarGoogle ScholarCross RefCross Ref
  17. Fengjun Lv and Ramakant Nevatia. 2006. Recognition and segmentation of 3-D human action using HMM and multi-class AdaBoost. In Proceedings of the 9th European Conference on Computer Vision (ECCV’06). 359--372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Meinard Müller and Tido Röder. 2006. Motion templates for automatic classification and retrieval of motion capture data. In Proceedings of the 2006 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA’06). 137--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Francisco Javier Ordóñez and Daniel Roggen. 2016. Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors 16, 1, 115.Google ScholarGoogle ScholarCross RefCross Ref
  20. Omar Oreifej and Zicheng Liu. 2013. HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. 716--723. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Stephen J. Preece, John Yannis Goulermas, Laurence P. J. Kenney, and David Howard. 2009. A comparison of feature extraction methods for the classification of dynamic activities from accelerometer data. IEEE Transactions on Biomedical Engineering 56, 3, 871--879. Google ScholarGoogle ScholarCross RefCross Ref
  22. Abu Saleh Md Mahfujur Rahman, M. Anwar Hossain, and Abdulmotaleb El Saddik. 2010. Spatial-geometric approach to physical mobile interaction based on accelerometer and IR sensory data fusion. ACM Transactions on Multimedia Computing, Communications, and Applications 6, 4, 28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jamie Shotton, Toby Sharp, Alex Kipman, Andrew Fitzgibbon, Mark Finocchio, Andrew Blake, Mat Cook, and Richard Moore. 2013. Real-time human pose recognition in parts from single depth images. Communications of the ACM 56, 1, 116--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jiang Wang, Zicheng Liu, Ying Wu, and Junsong Yuan. 2012. Mining actionlet ensemble for action recognition with depth cameras. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. 1290--1297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Xia, C.-C. Chen, and J. K. Aggarwal. 2012. View invariant human action recognition using histograms of 3D joints. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. 20--27. Google ScholarGoogle ScholarCross RefCross Ref
  26. Xiaodong Yang and Yingli Tian. 2014. Super normal vector for activity recognition using depth sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Xiaodong Yang, Chenyang Zhang, and Yingli Tian. 2012. Recognizing actions using depth motion maps-based histograms of oriented gradients. In Proceedings of the 20th ACM International Conference on Multimedia. 1057--1060. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jun Ye, Hao Hu, Kai Li, Guo-Jun Qi, and Kien A. Hua. 2015a. First-take-all: Temporal order-preserving hashing for 3D action videos. arXiv:1506.02184.Google ScholarGoogle Scholar
  29. Jun Ye, Kai Li, and Kien A. Hua. 2015b. WTA hash-based multimodal feature fusion for 3D human action recognition. In Proceedings of the 2015 IEEE International Symposium on Multimedia (ISM’15). IEEE, Los Alamitos, CA, 184--190. Google ScholarGoogle ScholarCross RefCross Ref
  30. Jun Ye, Kai Li, Guo-Jun Qi, and Kien A. Hua. 2015c. Temporal order-preserving dynamic quantization for human action recognition from multimodal sensor streams. In Proceedings of the 5th ACM International Conference on Multimedia Retrieval. ACM, New York, NY, 99--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jie Yin, Qiang Yang, and Jeffrey Junfeng Pan. 2008. Sensor-based abnormal human-activity detection. IEEE Transactions on Knowledge and Data Engineering 20, 8, 1082--1090. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Bo Zhang, Nicola Conci, and Francesco G. B. De Natale. 2015. Segmentation of discriminative patches in human activity video. ACM Transactions on Multimedia Computing, Communications, and Applications 12, 1, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Zhengyou Zhang. 2012. Microsoft Kinect sensor and its effect. IEEE MultiMedia 19, 2, 4--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Xin Zhao, Xue Li, Chaoyi Pang, Xiaofeng Zhu, and Quan Z. Sheng. 2013. Online human gesture recognition from motion data streams. In Proceedings of the 21st ACM International Conference on Multimedia. 23--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yu Zhu, Wenbin Chen, and Guodong Guo. 2013. Fusing spatiotemporal features and joints for 3D action recognition. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Temporal Order Modeling Approach to Human Action Recognition from Multimodal Sensor Data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 13, Issue 2
      May 2017
      226 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3058792
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 March 2017
      • Accepted: 1 December 2016
      • Revised: 1 November 2016
      • Received: 1 July 2016
      Published in tomm Volume 13, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader