Skip to main content

Advertisement

Log in

Human action recognition based on scene semantics

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Like outdoors, indoor security is also a critical problem and human action recognition in indoor area is still a hot topic. Most studies on human action recognition ignored the semantic information of a scene, whereas indoors contains varieties of semantics. Meanwhile, the depth sensor with color and depth data is more suitable for extracting the semantics context in human actions. Hence, this paper proposed an indoor action recognition method using Kinect based on the semantics of a scene. First, we proposed a trajectory clustering algorithm for a three-dimensional (3D) scene by combining the different characteristics of people such as the spatial location, movement direction, and speed. Based on the clustering results and scene context, it concludes a region of interest (ROI) extraction method for indoors, and dynamic time warping (DTW) is used to study the abnormal action sequences. Finally, the color and depth-data-based 3D motion history image (3D–MHI) features and the semantics context of the scene were combined to recognize human action. In the experiment, two datasets were tested and the results demonstrate that our semantics-based method performs better than other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Ahad MAR et al (2012) Motion history image: its variants and applications. Mach Vis Appl 23(2):255–281

    Article  Google Scholar 

  2. Atev S, Masoud O, Papanikolopoulos N (2006) Learning traffic patterns at intersections by spectral clustering of motion trajectories. In: 2006 IEEE/RSJ international conference on intelligent robots and systems

  3. Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: KDD workshop

  4. Bird ND et al (2005) Detection of loitering individuals in public transportation areas. IEEE Trans Intell Transp Syst 6(2):167–177

    Article  Google Scholar 

  5. Brand M, Oliver N, Pentland A (1997) Coupled hidden Markov models for complex action recognition. In: Proceedings of the 1997 I.E. computer society conference on computer vision and pattern recognition, 1997

  6. Brandle N, Bauer D, Seer S (2006) Track-based finding of stopping pedestrians- a practical approach for analyzing a public infrastructure. In: IEEE conference on intelligent transportation systems (ITSC'06), 2006

  7. Brendel W, Todorovic S (2011) Learning spatiotemporal graphs of human activities. In: 2011 I.E. international conference on computer vision (ICCV)

  8. Cao L, Ou Y, Philip SY (2012) Coupled behavior analysis with applications. IEEE Trans Knowl Data Eng 24(8):1378–1392

    Article  Google Scholar 

  9. Chakraborty B et al (2011) A selective spatio-temporal interest point detector for human action recognition in complex scenes. In: 2011 I.E. international conference on computer vision (ICCV)

  10. Chen C, Jafari R, Kehtarnavaz N (2016) A real-time human action recognition system using depth and inertial sensor fusion. IEEE Sensors J 16(3):773–781

    Article  Google Scholar 

  11. Chen C, Liu K, Kehtarnavaz N (2016) Real-time human action recognition based on depth motion maps. J Real-Time Image Proc 12(1):155–163

    Article  Google Scholar 

  12. Davis JW (2001) Hierarchical motion history images for recognizing human motion. In: Proceedings of IEEE workshop on detection and recognition of events in video, 2001

  13. Dollár P et al (2005) Behavior recognition via sparse spatio-temporal features. In: 2nd joint IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance, 2005

  14. Guo P et al (2012) Coupled observation decomposed hidden Markov model for multiperson activity recognition. IEEE Trans Circuits Syst Video Technol 22(9):1306–1320

    Article  Google Scholar 

  15. Guo W et al (2012) Kinect-based real-time RGB-D image fusion method. ISPRS-International archives of the photogrammetry, remote sensing and spatial information sciences, pp 275–279

  16. Tao H, Xinyan Z, Wei G, Kehua S (2013) Efficient interaction recognition through positive action representation. Math Probl Eng. https://doi.org/10.1155/2013/795360

    Google Scholar 

  17. Hu JP, Wang XY, Liu XJ (2014) A method of abnormal behavior detection in interrogation room based on video analysis combined with micro-spatial environment. Int J Geogr Inf Sci 16(4):545–552

    Google Scholar 

  18. Junejo IN, Javed O, Shah M (2004) Multi feature path modeling for video surveillance. In: Proceedings of the 17th international conference on pattern recognition (ICPR 2004), 2004

  19. Lei H, Jun-Feng L, Yun-De J (2010) Human interaction recognition using spatio-temporal words. Chin J Comput 33(4):776–784

    Article  Google Scholar 

  20. Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: 2010 I.E. computer society conference on computer vision and pattern recognition workshops (CVPRW)

  21. Liu C-D, Chung Y-N, Chung P-C (2010) An interaction-embedded HMM framework for human behavior understanding: with nursing environments as examples. IEEE Trans Inf Technol Biomed 14(5):1236–1246

    Article  Google Scholar 

  22. Luo J, Wang W, Qi H (2014) Spatio-temporal feature extraction and representation for RGB-D human action recognition. Pattern Recogn Lett 50:139–148

    Article  Google Scholar 

  23. Makris D, Ellis T (2005) Learning semantic scene models from observing activity in visual surveillance. IEEE Trans Syst Man Cybern B Cybern 35(3):397–408

    Article  Google Scholar 

  24. Masood SZ et al (2011) Measuring and reducing observational latency when recognizing actions. In: 2011 I.E. international conference on computer vision workshops (ICCV workshops)

  25. Masoud O, Papanikolopoulos N (2003) A method for human action recognition. Image Vis Comput 21(8):729–743

    Article  Google Scholar 

  26. Morris BT, Trivedi MM (2008) A survey of vision-based trajectory learning and analysis for surveillance. IEEE Trans Circuits Syst Video Technol 18(8):1114–1127

    Article  Google Scholar 

  27. Naftel A, Khalid S (2006) Classifying spatiotemporal object trajectories using unsupervised learning in the coefficient feature space. Multimedia Systems 12(3):227–238

    Article  Google Scholar 

  28. Naftel A, Khalid S (2006) Motion trajectory learning in the DFT-coefficient feature space. In: IEEE international conference on computer vision systems (ICVS'06), 2006

  29. Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems

  30. Nguyen NT et al (2005) Learning and detecting activities from movement trajectories using the hierarchical hidden Markov model. In: IEEE computer society conference on computer vision and pattern recognition (CVPR 2005), 2005.

  31. Ni B, Wang G, Moulin P (2013) RGBD-HudaAct: a color-depth video database for human daily activity recognition. In: Consumer depth cameras for computer vision. Springer, pp 193–208

  32. Nowozin S, Shotton J (2012) Action points: a representation for low-latency online human action recognition. Microsoft Research Cambridge, Tech. Rep. MSR-TR-2012-68

  33. Oliver N, Horvitz E, Garg A (2002) Layered representations for human activity recognition. In: Proceedings of the 4th IEEE international conference on multimodal interfaces

  34. Oreifej O, Liu Z (2013) Hon4d: histogram of oriented 4D normals for activity recognition from depth sequences. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  35. Ren X, Bo L, Fox D (2012) RGB-(d) scene labeling: Features and algorithms. In: 2012 I.E. conference on computer vision and pattern recognition (CVPR)

  36. Shahroudy A et al (2016) Multimodal multipart learning for action recognition in depth videos. IEEE Trans Pattern Anal Mach Intell 38(10):2123–2129

    Article  Google Scholar 

  37. Shao T et al (2012) An interactive approach to semantic modeling of indoor scenes with an RGBD camera. ACM Trans Graphics (TOG) 31(6):136

    Article  Google Scholar 

  38. Shin H-K, Lee S-W, Lee S-W (2005) Real-time gesture recognition using 3D motion history model. In: International conference on intelligent computing

  39. Shotton J et al (2013) Real-time human pose recognition in parts from single depth images. Commun ACM 56(1):116–124

    Article  Google Scholar 

  40. Nathan S, Derek H, Pushmeet K, Rob F (2012) Indoor segmentation and support inference from rgbd images. In European Conference on Computer Vision. Springer, Berlin, Heidelberg, pp 746–760.

    Google Scholar 

  41. Song Y, Cao L (2012) Graph-based coupled behavior analysis: a case study on detecting collaborative manipulations in stock markets. In: The 2012 international joint conference on neural networks (IJCNN)

  42. Vieira AW et al (2012) Stop: space-time occupancy patterns for 3D action recognition from depth map sequences. In: Iberoamerican congress on pattern recognition

  43. Wang J et al (2012) Robust 3D action recognition with random occupancy patterns. In: Computer vision (ECCV 2012), Springer, pp 872–885

  44. Wang J et al (2012) Mining actionlet ensemble for action recognition with depth cameras. In: 2012 I.E. conference on computer vision and pattern recognition (CVPR)

  45. Wang P et al (2016) Action recognition from depth maps using deep convolutional neural networks. IEEE Trans Hum Mach Syst 46(4):498–509

    Article  Google Scholar 

  46. Watanabe K, Kurita T (2008) Motion recognition by higher order local auto correlation features of motion history images. In: ECSIS symposium on bio-inspired learning and intelligent systems for security (BLISS'08), 2008

  47. Weinland D, Ronfard R, Boyer E (2005) Motion history volumes for free viewpoint action recognition. In: Workshop on modeling people and human interaction (PHI'05)

  48. Xia L, Chen C-C, Aggarwal J (2012) View invariant human action recognition using histograms of 3D joints. In: 2012 I.E. computer society conference on computer vision and pattern recognition workshops (CVPRW)

  49. Yan W, Forsyth DA (2005) Learning the behavior of users in a public space through video tracking. In: Seventh IEEE workshops on application of computer vision (WACV/MOTIONS'05 vol. 1), 2005

  50. Yang R, Sarkar S (2009) Coupled grouping and matching for sign and gesture recognition. Comput Vis Image Underst 113(6):663–681

    Article  Google Scholar 

  51. Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM international conference on multimedia

  52. Yu ZQ (2012) Vehicle trajectory clustering analysis and abnormal detection based on video. Phd thesis, Beijing Jiaotong University

  53. Zhang H, Parker LE (2011) 4-dimensional local spatio-temporal features for human activity recognition. In: 2011 IEEE/RSJ international conference on intelligent robots and systems (IROS)

Download references

Acknowledgements

This research is supported by the National Key Research and Development Program of China (No. 2016YFB0502204), the National Natural Science Foundation of China (No. 41301517), the Funds for the Central Universities (No. 413000010), the Open Found of State Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University (No.16(03)), the National Key Technology Research and Development Program (No. 2012BAH35B03).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Guo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, T., Zhu, X., Guo, W. et al. Human action recognition based on scene semantics. Multimed Tools Appl 78, 28515–28536 (2019). https://doi.org/10.1007/s11042-017-5496-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-5496-x

Keywords

Navigation