Abstract
A fast inverted index based algorithm is proposed for multi-class action recognition. This approach represents an action as a sequence of action states. Here, the action states are cluster centers of the extracted shape-motion features. At first, we compute the shape-motion features of a tracked actor. Secondly, a state binary tree is built by hierarchically clustering the extracted features. Then the training videos are represented as sequences of action states by searching the state binary tree. Based on the labeled state sequences, we create a state inverted index table and a state transition inverted index table. During testing, after representing a new action video as a state sequence, the state and state transition scores are computed by querying the inverted index tables. With the weight trained by the validation set, we get an action class score vector. The recognized action class label is the index of the maximum component of the score vector. Our key contribution is that we propose a fast multi-class action recognition approach based on two inverted index tables. Experiments on several challenging data sets confirm the performance of this approach.
Similar content being viewed by others
Notes
The center of the action interest region is determined as a point. The point is on the vertical central axis of the bounding box that is determined by the automatic pedestrian localization method, and the side length of the rectangle is proportional to the height of the bounding box.
The tree nodes are the cluster centers of the corresponding clusters.
References
Ahmad M, Lee S (2008) Human action recognition using shape and clg-motion flow from multi-view image sequences. Pattern R 41(7):2237–2252
Avriel M (2003) Nonlinear programming: Analysis and methods. Dover Publishing
Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. IEEE Int Conf Comput Vision:1395–1402
Bocker A, Derksen S, Schmidt E, Schneider G (2004) Hierarchical k-means clustering. H-K-means Manual
Chen B, Ting JA, Marlin B, de Freitas N (2010) Deep learning of invariant spatio-temporal features from video. IEEE Int’l. Workshop on Neural Information Processing Systems
Comaniciu D, Ramesh V, Meer P (2003) Kernel-based object tracking. IEEE Trans Pattern Anal Mach Intell 25(5):564–577
Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. Int’l Workshop Visual Surveillance and Performance Evaluation of Tracking and Surveillance pp. 65-72
Elgammal A, Shet V, Yacoob Y, Davis LS (2003) Learning dynamics for exemplar-based gesture recognition. IEEE Conf Comput Vision Pattern R:571–578
Felzenszwalb PF, Girshick RB, Mcallester D, Ramanan D (2010) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627– 1645
Ji S, Xu W, Yang M, Yu K (2013) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Jiang Z , Lin Z, Davis LS (2012) Recognizing human actions by learning and matching shape-motion prototype trees. IEEE Trans Pattern Analy Mach Intell 34(3):533–547
Junejo O, Dexter E, Perez P (2011) View-independent action recognition from temporal self-similarities. IEEE Trans Pattern R Mach Intell 33(1):172–185
Ke Y, Sukthankar R, Hebert M (2007) Spatio-temporal shape and flow correlation for action recognition. IEEE Conf Comput Vision Pattern R:1–8
Kim K, Chalidabhongse T, Harwood D, Davis L (2005) Real-time foreground-background segmentation using codebook model. Real-Time Imaging 11(3):167–256
Lin Z, Jiang Z, Davis LS (2009) Recognizing actions by shape-motion prototype trees. IEEE Intl Conf Comput Vision:444–451
Mikolajczyk K, Uemura H (2008) Action recognition with motion-appearance vocabulary forest. IEEE Conf.on Computer Vision and Pattern Recognition
Niebles JC, Wang H, Fei-Fei L (2007) Unsupervised learning of human action categories using spatial-temporal words. IEEE J Comput Vision 79(3):299–318
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. IEEE Conf Comput Vision Pattern R:2161–2168
Pei L, Ye M, Xu P, Zhao X , Guo G One example based action detection in hough space. Multimedia Tools and Applications. doi:10.1007/s11042-013-1478-9
Pei L, Ye M, Xu P, Zhao X, Li T (2013) Multi-class action recognition based on inverted index of action states. IEEE Intl Conf Image Process
Rao C, Yilmaz A, Shah M (2002) View-invariant representation and recognition of actions. Int J Comput Vision 50(2):203–226
Raptis M, Kokkinos I, Soatto S (2012) Discovering discriminative action parts from mid-level video representations. IEEE Conf Comput Vision Pattern R
Reddy K, Liu J, Shah M (2009) Incremental action recognition using feature-tree. IEEE Int Conf Comput Vision
Rodriguez M, Ahmed J, Shah M (2008) Action mach: A spatio-temporal maximum average correlation height filter for action recognition. IEEE Int Conf Comput Vision:3361– 3366
Schindler K, Gool LV (2008) Action snippets: How many frames does human action recognition require IEEE Conf Comput Vision Pattern R:1–8
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: A local svm approach. IEEE Int Conf Pattern R:32–36
Snyman J A (2005) Practical mathematical optimization: An introduction to basic optimization theory and classical and new gradient-based algorithms. Springer
Thurau C, Hlavac V (2008) Pose primitive based human action recognition in videos or still images. IEEE Conf Computer Vision and Pattern Recognition:1–8
US-AEMY (1987). Visual signals. Field Manual:21–60
Wang B, Ye M, Li X, Zhao F, Ding J (2012) Abnormal crowd behavior detection using high frequency and spatio temporal features. Mach Vision Appl 9(5):905–912
Wang Y, Sabzmeydani P, Mori G (2007) Semi-latent dirichlet allocation: A hierarchical model for human action recognition. ICCV Workshop Human Motion:240–254
Weinland D, Boyer E (2008) Action recognition using exemplar based embedding. IEEE Conf Computer Vision Pattern R:1–7
Yao A, Gall J, Gool L V (2010) A hough transform-based voting framework for action recognition. IEEE Conf Comput Vision Pattern R
Yeffet L, Wolf L (2009) Local trinary patterns for human action recognition. IEEE Int Conf Comput Vision:1–8
Yu G, Goussies NA, Yuan J, Liu Z (2011) Fast action detection via discriminative random forest voting and top-k subvolume search. IEEE Trans Multimedia 13(3):507–517
Yuan J , Liu Z , Wu Y (2011) Discriminative video pattern search for efficient action detection. IEEE Trans Pattern Anal Mach Intell 33(9):1728–1743
Zhang Z, Tao D (2012) Slow feature analysis for human action recognition. IEEE Trans Pattern Anal Mach Intell 34(3):436–450
Zou WY, Zhu S, Ng AY, Yu K (2012) Deep learning of invariant features via simulated fixations in video. IEEE Conf Neural Inf Process Syst
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (61375038).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pei, L., Ye, M., Xu, P. et al. Fast multi-class action recognition by querying inverted index tables. Multimed Tools Appl 74, 10801–10822 (2015). https://doi.org/10.1007/s11042-014-2207-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-2207-8