Abstract
Like outdoors, indoor security is also a critical problem and human action recognition in indoor area is still a hot topic. Most studies on human action recognition ignored the semantic information of a scene, whereas indoors contains varieties of semantics. Meanwhile, the depth sensor with color and depth data is more suitable for extracting the semantics context in human actions. Hence, this paper proposed an indoor action recognition method using Kinect based on the semantics of a scene. First, we proposed a trajectory clustering algorithm for a three-dimensional (3D) scene by combining the different characteristics of people such as the spatial location, movement direction, and speed. Based on the clustering results and scene context, it concludes a region of interest (ROI) extraction method for indoors, and dynamic time warping (DTW) is used to study the abnormal action sequences. Finally, the color and depth-data-based 3D motion history image (3D–MHI) features and the semantics context of the scene were combined to recognize human action. In the experiment, two datasets were tested and the results demonstrate that our semantics-based method performs better than other methods.















Similar content being viewed by others
References
Ahad MAR et al (2012) Motion history image: its variants and applications. Mach Vis Appl 23(2):255–281
Atev S, Masoud O, Papanikolopoulos N (2006) Learning traffic patterns at intersections by spectral clustering of motion trajectories. In: 2006 IEEE/RSJ international conference on intelligent robots and systems
Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: KDD workshop
Bird ND et al (2005) Detection of loitering individuals in public transportation areas. IEEE Trans Intell Transp Syst 6(2):167–177
Brand M, Oliver N, Pentland A (1997) Coupled hidden Markov models for complex action recognition. In: Proceedings of the 1997 I.E. computer society conference on computer vision and pattern recognition, 1997
Brandle N, Bauer D, Seer S (2006) Track-based finding of stopping pedestrians- a practical approach for analyzing a public infrastructure. In: IEEE conference on intelligent transportation systems (ITSC'06), 2006
Brendel W, Todorovic S (2011) Learning spatiotemporal graphs of human activities. In: 2011 I.E. international conference on computer vision (ICCV)
Cao L, Ou Y, Philip SY (2012) Coupled behavior analysis with applications. IEEE Trans Knowl Data Eng 24(8):1378–1392
Chakraborty B et al (2011) A selective spatio-temporal interest point detector for human action recognition in complex scenes. In: 2011 I.E. international conference on computer vision (ICCV)
Chen C, Jafari R, Kehtarnavaz N (2016) A real-time human action recognition system using depth and inertial sensor fusion. IEEE Sensors J 16(3):773–781
Chen C, Liu K, Kehtarnavaz N (2016) Real-time human action recognition based on depth motion maps. J Real-Time Image Proc 12(1):155–163
Davis JW (2001) Hierarchical motion history images for recognizing human motion. In: Proceedings of IEEE workshop on detection and recognition of events in video, 2001
Dollár P et al (2005) Behavior recognition via sparse spatio-temporal features. In: 2nd joint IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance, 2005
Guo P et al (2012) Coupled observation decomposed hidden Markov model for multiperson activity recognition. IEEE Trans Circuits Syst Video Technol 22(9):1306–1320
Guo W et al (2012) Kinect-based real-time RGB-D image fusion method. ISPRS-International archives of the photogrammetry, remote sensing and spatial information sciences, pp 275–279
Tao H, Xinyan Z, Wei G, Kehua S (2013) Efficient interaction recognition through positive action representation. Math Probl Eng. https://doi.org/10.1155/2013/795360
Hu JP, Wang XY, Liu XJ (2014) A method of abnormal behavior detection in interrogation room based on video analysis combined with micro-spatial environment. Int J Geogr Inf Sci 16(4):545–552
Junejo IN, Javed O, Shah M (2004) Multi feature path modeling for video surveillance. In: Proceedings of the 17th international conference on pattern recognition (ICPR 2004), 2004
Lei H, Jun-Feng L, Yun-De J (2010) Human interaction recognition using spatio-temporal words. Chin J Comput 33(4):776–784
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: 2010 I.E. computer society conference on computer vision and pattern recognition workshops (CVPRW)
Liu C-D, Chung Y-N, Chung P-C (2010) An interaction-embedded HMM framework for human behavior understanding: with nursing environments as examples. IEEE Trans Inf Technol Biomed 14(5):1236–1246
Luo J, Wang W, Qi H (2014) Spatio-temporal feature extraction and representation for RGB-D human action recognition. Pattern Recogn Lett 50:139–148
Makris D, Ellis T (2005) Learning semantic scene models from observing activity in visual surveillance. IEEE Trans Syst Man Cybern B Cybern 35(3):397–408
Masood SZ et al (2011) Measuring and reducing observational latency when recognizing actions. In: 2011 I.E. international conference on computer vision workshops (ICCV workshops)
Masoud O, Papanikolopoulos N (2003) A method for human action recognition. Image Vis Comput 21(8):729–743
Morris BT, Trivedi MM (2008) A survey of vision-based trajectory learning and analysis for surveillance. IEEE Trans Circuits Syst Video Technol 18(8):1114–1127
Naftel A, Khalid S (2006) Classifying spatiotemporal object trajectories using unsupervised learning in the coefficient feature space. Multimedia Systems 12(3):227–238
Naftel A, Khalid S (2006) Motion trajectory learning in the DFT-coefficient feature space. In: IEEE international conference on computer vision systems (ICVS'06), 2006
Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems
Nguyen NT et al (2005) Learning and detecting activities from movement trajectories using the hierarchical hidden Markov model. In: IEEE computer society conference on computer vision and pattern recognition (CVPR 2005), 2005.
Ni B, Wang G, Moulin P (2013) RGBD-HudaAct: a color-depth video database for human daily activity recognition. In: Consumer depth cameras for computer vision. Springer, pp 193–208
Nowozin S, Shotton J (2012) Action points: a representation for low-latency online human action recognition. Microsoft Research Cambridge, Tech. Rep. MSR-TR-2012-68
Oliver N, Horvitz E, Garg A (2002) Layered representations for human activity recognition. In: Proceedings of the 4th IEEE international conference on multimodal interfaces
Oreifej O, Liu Z (2013) Hon4d: histogram of oriented 4D normals for activity recognition from depth sequences. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Ren X, Bo L, Fox D (2012) RGB-(d) scene labeling: Features and algorithms. In: 2012 I.E. conference on computer vision and pattern recognition (CVPR)
Shahroudy A et al (2016) Multimodal multipart learning for action recognition in depth videos. IEEE Trans Pattern Anal Mach Intell 38(10):2123–2129
Shao T et al (2012) An interactive approach to semantic modeling of indoor scenes with an RGBD camera. ACM Trans Graphics (TOG) 31(6):136
Shin H-K, Lee S-W, Lee S-W (2005) Real-time gesture recognition using 3D motion history model. In: International conference on intelligent computing
Shotton J et al (2013) Real-time human pose recognition in parts from single depth images. Commun ACM 56(1):116–124
Nathan S, Derek H, Pushmeet K, Rob F (2012) Indoor segmentation and support inference from rgbd images. In European Conference on Computer Vision. Springer, Berlin, Heidelberg, pp 746–760.
Song Y, Cao L (2012) Graph-based coupled behavior analysis: a case study on detecting collaborative manipulations in stock markets. In: The 2012 international joint conference on neural networks (IJCNN)
Vieira AW et al (2012) Stop: space-time occupancy patterns for 3D action recognition from depth map sequences. In: Iberoamerican congress on pattern recognition
Wang J et al (2012) Robust 3D action recognition with random occupancy patterns. In: Computer vision (ECCV 2012), Springer, pp 872–885
Wang J et al (2012) Mining actionlet ensemble for action recognition with depth cameras. In: 2012 I.E. conference on computer vision and pattern recognition (CVPR)
Wang P et al (2016) Action recognition from depth maps using deep convolutional neural networks. IEEE Trans Hum Mach Syst 46(4):498–509
Watanabe K, Kurita T (2008) Motion recognition by higher order local auto correlation features of motion history images. In: ECSIS symposium on bio-inspired learning and intelligent systems for security (BLISS'08), 2008
Weinland D, Ronfard R, Boyer E (2005) Motion history volumes for free viewpoint action recognition. In: Workshop on modeling people and human interaction (PHI'05)
Xia L, Chen C-C, Aggarwal J (2012) View invariant human action recognition using histograms of 3D joints. In: 2012 I.E. computer society conference on computer vision and pattern recognition workshops (CVPRW)
Yan W, Forsyth DA (2005) Learning the behavior of users in a public space through video tracking. In: Seventh IEEE workshops on application of computer vision (WACV/MOTIONS'05 vol. 1), 2005
Yang R, Sarkar S (2009) Coupled grouping and matching for sign and gesture recognition. Comput Vis Image Underst 113(6):663–681
Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM international conference on multimedia
Yu ZQ (2012) Vehicle trajectory clustering analysis and abnormal detection based on video. Phd thesis, Beijing Jiaotong University
Zhang H, Parker LE (2011) 4-dimensional local spatio-temporal features for human activity recognition. In: 2011 IEEE/RSJ international conference on intelligent robots and systems (IROS)
Acknowledgements
This research is supported by the National Key Research and Development Program of China (No. 2016YFB0502204), the National Natural Science Foundation of China (No. 41301517), the Funds for the Central Universities (No. 413000010), the Open Found of State Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University (No.16(03)), the National Key Technology Research and Development Program (No. 2012BAH35B03).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hu, T., Zhu, X., Guo, W. et al. Human action recognition based on scene semantics. Multimed Tools Appl 78, 28515–28536 (2019). https://doi.org/10.1007/s11042-017-5496-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-5496-x