Human action recognition based on scene semantics

Hu, Tao; Zhu, Xinyan; Guo, Wei; Wang, Shaohua; Zhu, Jianfeng

doi:10.1007/s11042-017-5496-x

Human action recognition based on scene semantics

Published: 01 February 2018

Volume 78, pages 28515–28536, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Tao Hu^1,2,3,4,
Xinyan Zhu^1,2,
Wei Guo^1,2,
Shaohua Wang⁵ &
…
Jianfeng Zhu⁶

683 Accesses
7 Citations
Explore all metrics

Abstract

Like outdoors, indoor security is also a critical problem and human action recognition in indoor area is still a hot topic. Most studies on human action recognition ignored the semantic information of a scene, whereas indoors contains varieties of semantics. Meanwhile, the depth sensor with color and depth data is more suitable for extracting the semantics context in human actions. Hence, this paper proposed an indoor action recognition method using Kinect based on the semantics of a scene. First, we proposed a trajectory clustering algorithm for a three-dimensional (3D) scene by combining the different characteristics of people such as the spatial location, movement direction, and speed. Based on the clustering results and scene context, it concludes a region of interest (ROI) extraction method for indoors, and dynamic time warping (DTW) is used to study the abnormal action sequences. Finally, the color and depth-data-based 3D motion history image (3D–MHI) features and the semantics context of the scene were combined to recognize human action. In the experiment, two datasets were tested and the results demonstrate that our semantics-based method performs better than other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Hybrid Feature Extraction Approach for Human Detection, Tracking and Activity Recognition Using Depth Sensors

Article 30 November 2015

Latent semantic learning with time-series cross correlation analysis for video scene detection and classification

Article 19 March 2015

RGBD-HuDaAct: A Color-Depth Video Database for Human Daily Activity Recognition

References

Ahad MAR et al (2012) Motion history image: its variants and applications. Mach Vis Appl 23(2):255–281
Article Google Scholar
Atev S, Masoud O, Papanikolopoulos N (2006) Learning traffic patterns at intersections by spectral clustering of motion trajectories. In: 2006 IEEE/RSJ international conference on intelligent robots and systems
Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: KDD workshop
Bird ND et al (2005) Detection of loitering individuals in public transportation areas. IEEE Trans Intell Transp Syst 6(2):167–177
Article Google Scholar
Brand M, Oliver N, Pentland A (1997) Coupled hidden Markov models for complex action recognition. In: Proceedings of the 1997 I.E. computer society conference on computer vision and pattern recognition, 1997
Brandle N, Bauer D, Seer S (2006) Track-based finding of stopping pedestrians- a practical approach for analyzing a public infrastructure. In: IEEE conference on intelligent transportation systems (ITSC'06), 2006
Brendel W, Todorovic S (2011) Learning spatiotemporal graphs of human activities. In: 2011 I.E. international conference on computer vision (ICCV)
Cao L, Ou Y, Philip SY (2012) Coupled behavior analysis with applications. IEEE Trans Knowl Data Eng 24(8):1378–1392
Article Google Scholar
Chakraborty B et al (2011) A selective spatio-temporal interest point detector for human action recognition in complex scenes. In: 2011 I.E. international conference on computer vision (ICCV)
Chen C, Jafari R, Kehtarnavaz N (2016) A real-time human action recognition system using depth and inertial sensor fusion. IEEE Sensors J 16(3):773–781
Article Google Scholar
Chen C, Liu K, Kehtarnavaz N (2016) Real-time human action recognition based on depth motion maps. J Real-Time Image Proc 12(1):155–163
Article Google Scholar
Davis JW (2001) Hierarchical motion history images for recognizing human motion. In: Proceedings of IEEE workshop on detection and recognition of events in video, 2001
Dollár P et al (2005) Behavior recognition via sparse spatio-temporal features. In: 2nd joint IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance, 2005
Guo P et al (2012) Coupled observation decomposed hidden Markov model for multiperson activity recognition. IEEE Trans Circuits Syst Video Technol 22(9):1306–1320
Article Google Scholar
Guo W et al (2012) Kinect-based real-time RGB-D image fusion method. ISPRS-International archives of the photogrammetry, remote sensing and spatial information sciences, pp 275–279
Tao H, Xinyan Z, Wei G, Kehua S (2013) Efficient interaction recognition through positive action representation. Math Probl Eng. https://doi.org/10.1155/2013/795360
Google Scholar
Hu JP, Wang XY, Liu XJ (2014) A method of abnormal behavior detection in interrogation room based on video analysis combined with micro-spatial environment. Int J Geogr Inf Sci 16(4):545–552
Google Scholar
Junejo IN, Javed O, Shah M (2004) Multi feature path modeling for video surveillance. In: Proceedings of the 17th international conference on pattern recognition (ICPR 2004), 2004
Lei H, Jun-Feng L, Yun-De J (2010) Human interaction recognition using spatio-temporal words. Chin J Comput 33(4):776–784
Article Google Scholar
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: 2010 I.E. computer society conference on computer vision and pattern recognition workshops (CVPRW)
Liu C-D, Chung Y-N, Chung P-C (2010) An interaction-embedded HMM framework for human behavior understanding: with nursing environments as examples. IEEE Trans Inf Technol Biomed 14(5):1236–1246
Article Google Scholar
Luo J, Wang W, Qi H (2014) Spatio-temporal feature extraction and representation for RGB-D human action recognition. Pattern Recogn Lett 50:139–148
Article Google Scholar
Makris D, Ellis T (2005) Learning semantic scene models from observing activity in visual surveillance. IEEE Trans Syst Man Cybern B Cybern 35(3):397–408
Article Google Scholar
Masood SZ et al (2011) Measuring and reducing observational latency when recognizing actions. In: 2011 I.E. international conference on computer vision workshops (ICCV workshops)
Masoud O, Papanikolopoulos N (2003) A method for human action recognition. Image Vis Comput 21(8):729–743
Article Google Scholar
Morris BT, Trivedi MM (2008) A survey of vision-based trajectory learning and analysis for surveillance. IEEE Trans Circuits Syst Video Technol 18(8):1114–1127
Article Google Scholar
Naftel A, Khalid S (2006) Classifying spatiotemporal object trajectories using unsupervised learning in the coefficient feature space. Multimedia Systems 12(3):227–238
Article Google Scholar
Naftel A, Khalid S (2006) Motion trajectory learning in the DFT-coefficient feature space. In: IEEE international conference on computer vision systems (ICVS'06), 2006
Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems
Nguyen NT et al (2005) Learning and detecting activities from movement trajectories using the hierarchical hidden Markov model. In: IEEE computer society conference on computer vision and pattern recognition (CVPR 2005), 2005.
Ni B, Wang G, Moulin P (2013) RGBD-HudaAct: a color-depth video database for human daily activity recognition. In: Consumer depth cameras for computer vision. Springer, pp 193–208
Nowozin S, Shotton J (2012) Action points: a representation for low-latency online human action recognition. Microsoft Research Cambridge, Tech. Rep. MSR-TR-2012-68
Oliver N, Horvitz E, Garg A (2002) Layered representations for human activity recognition. In: Proceedings of the 4th IEEE international conference on multimodal interfaces
Oreifej O, Liu Z (2013) Hon4d: histogram of oriented 4D normals for activity recognition from depth sequences. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Ren X, Bo L, Fox D (2012) RGB-(d) scene labeling: Features and algorithms. In: 2012 I.E. conference on computer vision and pattern recognition (CVPR)
Shahroudy A et al (2016) Multimodal multipart learning for action recognition in depth videos. IEEE Trans Pattern Anal Mach Intell 38(10):2123–2129
Article Google Scholar
Shao T et al (2012) An interactive approach to semantic modeling of indoor scenes with an RGBD camera. ACM Trans Graphics (TOG) 31(6):136
Article Google Scholar
Shin H-K, Lee S-W, Lee S-W (2005) Real-time gesture recognition using 3D motion history model. In: International conference on intelligent computing
Shotton J et al (2013) Real-time human pose recognition in parts from single depth images. Commun ACM 56(1):116–124
Article Google Scholar
Nathan S, Derek H, Pushmeet K, Rob F (2012) Indoor segmentation and support inference from rgbd images. In European Conference on Computer Vision. Springer, Berlin, Heidelberg, pp 746–760.
Google Scholar
Song Y, Cao L (2012) Graph-based coupled behavior analysis: a case study on detecting collaborative manipulations in stock markets. In: The 2012 international joint conference on neural networks (IJCNN)
Vieira AW et al (2012) Stop: space-time occupancy patterns for 3D action recognition from depth map sequences. In: Iberoamerican congress on pattern recognition
Wang J et al (2012) Robust 3D action recognition with random occupancy patterns. In: Computer vision (ECCV 2012), Springer, pp 872–885
Wang J et al (2012) Mining actionlet ensemble for action recognition with depth cameras. In: 2012 I.E. conference on computer vision and pattern recognition (CVPR)
Wang P et al (2016) Action recognition from depth maps using deep convolutional neural networks. IEEE Trans Hum Mach Syst 46(4):498–509
Article Google Scholar
Watanabe K, Kurita T (2008) Motion recognition by higher order local auto correlation features of motion history images. In: ECSIS symposium on bio-inspired learning and intelligent systems for security (BLISS'08), 2008
Weinland D, Ronfard R, Boyer E (2005) Motion history volumes for free viewpoint action recognition. In: Workshop on modeling people and human interaction (PHI'05)
Xia L, Chen C-C, Aggarwal J (2012) View invariant human action recognition using histograms of 3D joints. In: 2012 I.E. computer society conference on computer vision and pattern recognition workshops (CVPRW)
Yan W, Forsyth DA (2005) Learning the behavior of users in a public space through video tracking. In: Seventh IEEE workshops on application of computer vision (WACV/MOTIONS'05 vol. 1), 2005
Yang R, Sarkar S (2009) Coupled grouping and matching for sign and gesture recognition. Comput Vis Image Underst 113(6):663–681
Article Google Scholar
Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM international conference on multimedia
Yu ZQ (2012) Vehicle trajectory clustering analysis and abnormal detection based on video. Phd thesis, Beijing Jiaotong University
Zhang H, Parker LE (2011) 4-dimensional local spatio-temporal features for human activity recognition. In: 2011 IEEE/RSJ international conference on intelligent robots and systems (IROS)

Download references

Acknowledgements

This research is supported by the National Key Research and Development Program of China (No. 2016YFB0502204), the National Natural Science Foundation of China (No. 41301517), the Funds for the Central Universities (No. 413000010), the Open Found of State Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University (No.16(03)), the National Key Technology Research and Development Program (No. 2012BAH35B03).

Author information

Authors and Affiliations

State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, 430070, China
Tao Hu, Xinyan Zhu & Wei Guo
Collaborative Innovation Center of Geospatial Technology, Wuhan University, Wuhan, 430070, China
Tao Hu, Xinyan Zhu & Wei Guo
School of Resource and Environment Sciences, Wuhan University, Wuhan, 430070, China
Tao Hu
School of Information, Kent State University, Kent, OH, 44240, USA
Tao Hu
International School of Software, Wuhan University, Wuhan, 430070, China
Shaohua Wang
School of Digital Sciences, Kent State University, Kent, OH, 44240, USA
Jianfeng Zhu

Authors

Tao Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xinyan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Guo
View author publications
You can also search for this author in PubMed Google Scholar
Shaohua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianfeng Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Guo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, T., Zhu, X., Guo, W. et al. Human action recognition based on scene semantics. Multimed Tools Appl 78, 28515–28536 (2019). https://doi.org/10.1007/s11042-017-5496-x

Download citation

Received: 09 August 2017
Revised: 26 October 2017
Accepted: 05 December 2017
Published: 01 February 2018
Issue Date: October 2019
DOI: https://doi.org/10.1007/s11042-017-5496-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human action recognition based on scene semantics

Abstract

Access this article

Similar content being viewed by others

A Hybrid Feature Extraction Approach for Human Detection, Tracking and Activity Recognition Using Depth Sensors

Latent semantic learning with time-series cross correlation analysis for video scene detection and classification

RGBD-HuDaAct: A Color-Depth Video Database for Human Daily Activity Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Human action recognition based on scene semantics

Abstract

Access this article

Similar content being viewed by others

A Hybrid Feature Extraction Approach for Human Detection, Tracking and Activity Recognition Using Depth Sensors

Latent semantic learning with time-series cross correlation analysis for video scene detection and classification

RGBD-HuDaAct: A Color-Depth Video Database for Human Daily Activity Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation