Abstract
In this paper, we propose a new framework for view independent action recognition, which uses a combination of a view-dependent representation and a view-independent representation. The view-dependent representation reduces the number of possible action’s labels prior to the view-independent representation. We used the entropy of silhouette’s distance transformation as view-dependent representation and the self-similarity matrix of the trajectory of uniformly distributed feature points over the human body as view-independent representation. The experiment results show that the proposed method outperforms recent action recognition approaches despite its low computational cost.
Similar content being viewed by others
References
“mocapdata.com.” [Online]. Available: http://www.mocapdata.com/. Accessed 03 Aug 2014
Ahmad M, Lee SW (2006) HMM-based human action recognition using multiview image sequences. In Pattern Recognition, 2006. ICPR 2006. 18th International Conference on. 1: 263–266
Benmokhtar R, Huet B (2006) Neural network combining classifier based on Dempster-Shafer theory for semantic indexing in video content. In Advances in Multimedia Modeling, Springer, pp 196–205
Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267
Bodor R, Jackson B, Masoud O, Papanikolopoulos N (2003) Image-based reconstruction for view-independent human motion recognition. In Intelligent Robots and Systems, 2003.(IROS 2003). Proceedings. 2003 IEEE/RSJ International Conference on, vol 2, pp 1548–1553
Candamo J, Shreve M, Goldgof DB, Sapper DB, Kasturi R (2010) Understanding transit scenes: a survey on human behavior-recognition algorithms. IEEE Trans Intell Transp Syst 11(1):206–224
Chang S-F (2002) The holy grail of content-based media analysis. IEEE Multi Media 9(2):6–10
Chen H-S, Chen H-T, Chen Y-W, Lee S-Y (2006) Human action recognition using star skeleton. In Proceedings of the 4th ACM international workshop on Video surveillance and sensor networks, pp 171–178
Cuzzolin F, Sarti A, Tubaro S (2004) Action modeling with volumetric data. In Image Processing, 2004. ICIP’04. 2004 International Conference on. 2: 881–884
Dee HM, Velastin SA (2008) How close are we to solving the problem of automated visual surveillance? Mach Vis Appl 19(5–6):329–343
DeMenthon D, Kobla V, Doermann D (1998) Video summarization by curve simplification, In Proceedings of the sixth ACM international conference on Multimedia, pp 211–218
Doulamis N, Doulamis A (2006) Evaluation of relevance feedback schemes in content-based in retrieval systems. Signal Process Image Commun 21(4):334–357
Haering N, Venetianer PL, Lipton A (2008) The evolution of video surveillance: an overview. Mach Vis Appl 19(5–6):279–290
Howarth RJ, Buxton H (2000) Conceptual descriptions from monitoring and watching image sequences. Image Vis Comput 18(2):105–135
Hu W, Tan T, Wang L, Maybank S (2004) A survey on visual surveillance of object motion and behaviors. IEEE Trans Syst Man Cybern Part C Appl Rev 34(3):334–352
Hu M, Wang Y, Zhang Z, Zhang D, Little JJ (2013) Incremental learning for video-based gait recognition with LBP flow. IEEE Trans Cybern 43(1):77–89
Jain AK, Ross A, Prabhakar S (2004) An introduction to biometric recognition. IEEE Trans Circuits Syst Video Technol 14(1):4–20
Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Junejo IN, Dexter E, Laptev I, Pérez P (2011) View-independent action recognition from temporal self-similarities. IEEE Trans Pattern Anal Mach Intell 33(1):172–185
Laptev I, Caputo B, Schüldt C, Lindeberg T (2007) Local velocity-adapted motion events for spatio-temporal recognition. Comput Vis Image Underst 108(3):207–229
Lavee G, Rivlin E, Rudzsky M (2009) Understanding video events: a survey of methods for automatic interpretation of semantic occurrences in video. IEEE Trans Syst Man Cybern Part C Appl Rev 39(5):489–504
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Liu J, Ali S, Shah M (2008) Recognizing human actions using multiple features. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pp 1–8
Liu J, Shah M (2008) Learning human actions via information maximization. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pp 1–8
Liu Q, Yang Y, Gao Y, Ji R, Yu L (2013) A Bayesian framework for dense depth estimation based on spatial–temporal correlation. Neurocomputing 104:1–9
Lv F, Nevatia R (2007) Single view human action recognition using key pose matching and viterbi path searching, In Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on, pp 1–8
Lv F, Nevatia R, Lee MW (2005) 3D human action recognition using spatio-temporal motion templates, In Computer Vision in Human-Computer Interaction, Springer, pp 120–130
Masoud O (2000) Tracking and analysis of articulated motion with an application to human motion. University of Minnesota
Nam J, Tewfik AH (2002) Event-driven video abstraction and visualization. Multimed Tools Appl 16(1–2):55–77
Natarajan P, Nevatia R (2008) View and scale invariant action recognition using multiview shape-flow models. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pp 1–8
Niebles JC, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318
Ogale A, Karapurkar A, Guerra-Filho G, Aloimonos Y (2004) View-invariant identification of pose sequences for action recognition, in In VACE
Pantic M, Pentland A, Nijholt A, Huang TS (2007) Human computing and machine understanding of human behavior: a survey. In Artifical Intelligence for Human Computing, Springer, pp 47–71.
Poser 3D Animation & Character Creation Software - Official Website. [Online]. Available: http://poser.smithmicro.com/. Accessed 03 Aug 2014
Prest A, Schmid C, Ferrari V (2012) Weakly supervised learning of interactions between humans and objects. IEEE Trans Pattern Anal Mach Intell 34(3):601–614
Ramagiri S, Kavi R, Kulathumani V (2011) Real-time multi-view human action recognition using a wireless camera network. In Distributed Smart Cameras (ICDSC), 2011 Fifth ACM/IEEE International Conference on, pp 1–6
Ran Y, Zheng Q, Chellappa R, Strat TM (2010) Applications of a simple characterization of human gait in surveillance. IEEE Trans Syst Man Cybern Part B: Cybern 40(4):1009–1020
Rogez G, Guerrero JJ, Martínez J, Orrite-Urunuela C (2006) Viewpoint Independent Human Motion Analysis in Man-made Environments. In BMVC, pp 659–668
Roh MC, Shin HK, Lee SW, Lee SW (2006) Volume motion template for view-invariant gesture recognition. 2: 1229–1232
Sarkar S, Phillips PJ, Liu Z, Vega IR, Grother P, Bowyer KW (2005) The humanid gait challenge problem: data sets, performance, and analysis. IEEE Trans Pattern Anal Mach Intell 27(2):162–177
Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mob Comput Commun Rev 5(1):3–55
Shen Y, Foroosh H (2008) View-invariant action recognition using fundamental ratios, pp 1–6
Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, Cook M, Moore R (2013) Real-time human pose recognition in parts from single depth images. Commun ACM 56(1):116–124
Simon C, Meessen J, De Vleeschouwer C (2010) Visual event recognition using decision trees. Multimed Tools Appl 50(1):95–121
Sulman N, Sanocki T, Goldgof D, Kasturi R (2008) How effective is human video surveillance performance?, In Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, pp 1–3.
Turaga P, Chellappa R, Subrahmanian VS, Udrea O (2008) Machine recognition of human activities: a survey. IEEE Trans Circuits Syst Video Technol 18(11):1473–1488
Wang H, Klaser A, Schmid C, Liu C-L (2011) Action recognition by dense trajectories. In Computer Vision and Pattern Recognition (CVPR), 2011 I.E. Conference on, pp 3169–3176
Wang H, Kläser A, Schmid C, Liu C-L (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103(1):60–79
Wang H, Schmid C (2013) Action recognition with improved trajectories. in Computer Vision (ICCV), 2013 I.E. International Conference on, pp 3551–3558
Weinland D, Boyer E, Ronfard R (2007) Action recognition from arbitrary views using 3d exemplars, in Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pp 1–7
Weinland D, Ronfard R, Boyer E (2006) Free viewpoint action recognition using motion history volumes. Comput Vis Image Underst 104(2):249–257
Weinland D, Ronfard R, Boyer E (2011) A survey of vision-based methods for action representation, segmentation and recognition. Comput Vis Image Underst 115(2):224–241
Xia L, Aggarwal J (2013) Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera, in Computer Vision and Pattern Recognition (CVPR), 2013 I.E. Conference on, pp 2834–2841
Xiang T, Gong S (2006) Beyond tracking: modelling activity and understanding behaviour. Int J Comput Vis 67(1):21–51
Yan P, Khan SM, Shah M (2008) Learning 4d action feature models for arbitrary view action recognition. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pp 1–7
Yang X, Tian Y (2014) Effective 3D action recognition using eigenjoints. J Vis Commun Image Represent 25(1):2–11
Yilmaz A, Shah M (2005) Actions sketch: A novel action representation. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. 1: 984–989
Yu H, Sun G, Song W, Li X (2005) Human motion recognition based on neural network. In Communications, circuits and systems, 2005. Proceedings. 2005 international conference on, vol. 2
Zhang K, Lu J, Yang Q, Lafruit G, Lauwereins R, Van Gool L (2011) Real-time and accurate stereo: a scalable approach with bitwise fast voting on CUDA. IEEE Trans Circuits Syst Video Technol 21(7):867–878
Zhao S, Chen L, Yao H, Zhang Y, Sun X (2014) Strategy for dynamic 3D depth data matching towards robust action retrieval. Neurocomputing 151:533–543
Zhao T, Nevatia R (2002) 3D tracking of human locomotion: a tracking as recognition approach. In Pattern Recognition, 2002. Proceedings. 16th International Conference on. 1: 546–551
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hashemi, S.M., Rahmati, M. View-independent action recognition: a hybrid approach. Multimed Tools Appl 75, 6755–6775 (2016). https://doi.org/10.1007/s11042-015-2606-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-2606-5