Skip to main content
Log in

Fast multi-class action recognition by querying inverted index tables

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

A fast inverted index based algorithm is proposed for multi-class action recognition. This approach represents an action as a sequence of action states. Here, the action states are cluster centers of the extracted shape-motion features. At first, we compute the shape-motion features of a tracked actor. Secondly, a state binary tree is built by hierarchically clustering the extracted features. Then the training videos are represented as sequences of action states by searching the state binary tree. Based on the labeled state sequences, we create a state inverted index table and a state transition inverted index table. During testing, after representing a new action video as a state sequence, the state and state transition scores are computed by querying the inverted index tables. With the weight trained by the validation set, we get an action class score vector. The recognized action class label is the index of the maximum component of the score vector. Our key contribution is that we propose a fast multi-class action recognition approach based on two inverted index tables. Experiments on several challenging data sets confirm the performance of this approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. The center of the action interest region is determined as a point. The point is on the vertical central axis of the bounding box that is determined by the automatic pedestrian localization method, and the side length of the rectangle is proportional to the height of the bounding box.

  2. The tree nodes are the cluster centers of the corresponding clusters.

References

  1. Ahmad M, Lee S (2008) Human action recognition using shape and clg-motion flow from multi-view image sequences. Pattern R 41(7):2237–2252

    Article  MATH  Google Scholar 

  2. Avriel M (2003) Nonlinear programming: Analysis and methods. Dover Publishing

  3. Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. IEEE Int Conf Comput Vision:1395–1402

  4. Bocker A, Derksen S, Schmidt E, Schneider G (2004) Hierarchical k-means clustering. H-K-means Manual

  5. Chen B, Ting JA, Marlin B, de Freitas N (2010) Deep learning of invariant spatio-temporal features from video. IEEE Int’l. Workshop on Neural Information Processing Systems

  6. Comaniciu D, Ramesh V, Meer P (2003) Kernel-based object tracking. IEEE Trans Pattern Anal Mach Intell 25(5):564–577

    Article  Google Scholar 

  7. Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. Int’l Workshop Visual Surveillance and Performance Evaluation of Tracking and Surveillance pp. 65-72

  8. Elgammal A, Shet V, Yacoob Y, Davis LS (2003) Learning dynamics for exemplar-based gesture recognition. IEEE Conf Comput Vision Pattern R:571–578

  9. Felzenszwalb PF, Girshick RB, Mcallester D, Ramanan D (2010) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627– 1645

    Article  Google Scholar 

  10. Ji S, Xu W, Yang M, Yu K (2013) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231

    Article  Google Scholar 

  11. Jiang Z , Lin Z, Davis LS (2012) Recognizing human actions by learning and matching shape-motion prototype trees. IEEE Trans Pattern Analy Mach Intell 34(3):533–547

    Article  Google Scholar 

  12. Junejo O, Dexter E, Perez P (2011) View-independent action recognition from temporal self-similarities. IEEE Trans Pattern R Mach Intell 33(1):172–185

    Article  Google Scholar 

  13. Ke Y, Sukthankar R, Hebert M (2007) Spatio-temporal shape and flow correlation for action recognition. IEEE Conf Comput Vision Pattern R:1–8

  14. Kim K, Chalidabhongse T, Harwood D, Davis L (2005) Real-time foreground-background segmentation using codebook model. Real-Time Imaging 11(3):167–256

    Article  Google Scholar 

  15. Lin Z, Jiang Z, Davis LS (2009) Recognizing actions by shape-motion prototype trees. IEEE Intl Conf Comput Vision:444–451

  16. Mikolajczyk K, Uemura H (2008) Action recognition with motion-appearance vocabulary forest. IEEE Conf.on Computer Vision and Pattern Recognition

  17. Niebles JC, Wang H, Fei-Fei L (2007) Unsupervised learning of human action categories using spatial-temporal words. IEEE J Comput Vision 79(3):299–318

    Article  Google Scholar 

  18. Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. IEEE Conf Comput Vision Pattern R:2161–2168

  19. Pei L, Ye M, Xu P, Zhao X , Guo G One example based action detection in hough space. Multimedia Tools and Applications. doi:10.1007/s11042-013-1478-9

  20. Pei L, Ye M, Xu P, Zhao X, Li T (2013) Multi-class action recognition based on inverted index of action states. IEEE Intl Conf Image Process

  21. Rao C, Yilmaz A, Shah M (2002) View-invariant representation and recognition of actions. Int J Comput Vision 50(2):203–226

    Article  MATH  Google Scholar 

  22. Raptis M, Kokkinos I, Soatto S (2012) Discovering discriminative action parts from mid-level video representations. IEEE Conf Comput Vision Pattern R

  23. Reddy K, Liu J, Shah M (2009) Incremental action recognition using feature-tree. IEEE Int Conf Comput Vision

  24. Rodriguez M, Ahmed J, Shah M (2008) Action mach: A spatio-temporal maximum average correlation height filter for action recognition. IEEE Int Conf Comput Vision:3361– 3366

  25. Schindler K, Gool LV (2008) Action snippets: How many frames does human action recognition require IEEE Conf Comput Vision Pattern R:1–8

  26. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: A local svm approach. IEEE Int Conf Pattern R:32–36

  27. Snyman J A (2005) Practical mathematical optimization: An introduction to basic optimization theory and classical and new gradient-based algorithms. Springer

  28. Thurau C, Hlavac V (2008) Pose primitive based human action recognition in videos or still images. IEEE Conf Computer Vision and Pattern Recognition:1–8

  29. US-AEMY (1987). Visual signals. Field Manual:21–60

  30. Wang B, Ye M, Li X, Zhao F, Ding J (2012) Abnormal crowd behavior detection using high frequency and spatio temporal features. Mach Vision Appl 9(5):905–912

    Google Scholar 

  31. Wang Y, Sabzmeydani P, Mori G (2007) Semi-latent dirichlet allocation: A hierarchical model for human action recognition. ICCV Workshop Human Motion:240–254

  32. Weinland D, Boyer E (2008) Action recognition using exemplar based embedding. IEEE Conf Computer Vision Pattern R:1–7

  33. Yao A, Gall J, Gool L V (2010) A hough transform-based voting framework for action recognition. IEEE Conf Comput Vision Pattern R

  34. Yeffet L, Wolf L (2009) Local trinary patterns for human action recognition. IEEE Int Conf Comput Vision:1–8

  35. Yu G, Goussies NA, Yuan J, Liu Z (2011) Fast action detection via discriminative random forest voting and top-k subvolume search. IEEE Trans Multimedia 13(3):507–517

    Article  Google Scholar 

  36. Yuan J , Liu Z , Wu Y (2011) Discriminative video pattern search for efficient action detection. IEEE Trans Pattern Anal Mach Intell 33(9):1728–1743

    Article  Google Scholar 

  37. Zhang Z, Tao D (2012) Slow feature analysis for human action recognition. IEEE Trans Pattern Anal Mach Intell 34(3):436–450

    Article  MathSciNet  Google Scholar 

  38. Zou WY, Zhu S, Ng AY, Yu K (2012) Deep learning of invariant features via simulated fixations in video. IEEE Conf Neural Inf Process Syst

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (61375038).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mao Ye.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pei, L., Ye, M., Xu, P. et al. Fast multi-class action recognition by querying inverted index tables. Multimed Tools Appl 74, 10801–10822 (2015). https://doi.org/10.1007/s11042-014-2207-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-2207-8

Keywords

Navigation