Abstract
Action recognition has very high academic research value, potential commercial value and wide market application prospect in computer vision. In order to improve the action recognition accuracy, two kinds of dynamic descriptors based on dense trajectories are proposed in this paper. Firstly, to capture the local position information that action occurs, dense sampling in motion regions is done by constraining and clustering of optical flow. Secondly, the motion corners of object are selected as feature points which are then tracked to obtain motion trajectories. Finally, the gradient information and optical flow gradient information are extracted respectively in the video cube centered at the trajectories, then the auto-correlation and normalization processing are carried out on the two above information to obtain two dynamic descriptors named 3D histograms of oriented gradients in trajectory centered cube auto-correlation and 3D histograms of oriented optical flow gradients auto-correlation, which can resist a certain degree of interferences caused by camera motion and complex background. However, the diversity of realistic videos makes dynamic or static descriptors alone unable to achieve accurate action classification. A new framework is proposed, which makes the dynamic descriptors and static descriptors fuse and supplement mutually to further improve the action recognition accuracy. This paper adopts the leave-one-out cross validation on datasets of Weizmann and UCF-Sports with action recognition accuracy of 100 % and 96.00 %, and adopts the four-fold cross validation on datasets of KTH and YouTube with action recognition accuracy of 97.17 % and 88.23 %, which has the better performance over the references.







Similar content being viewed by others
References
Al Harbi N, Gotoh Y (2015) A unified spatio-temporal human body region tracking approach to action recognition. Neurocomputing 161:56–64
Amerini I, Ballan L, Caldelli R et al (2011) A SIFT-based forensic method for copy–move attack detection and transformation recovery. IEEE Trans Inf Forensic Secur 6(3):1099–1110
Bellamine I, Tairi H (2013) Motion detection and tracking using space-time interest points. In: ACS International Conference on Computer Systems and Applications (AICCSA), pp 1–7
Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267
Chaquet JM, Carmona EJ, Fernández-Caballero A (2013) A survey of video datasets for human action and activity recognition. Comput Vis Image Underst 117(6):633–659
Cho J, Lee M, Chang HJ, Oh S (2014) Robust action recognition using local motion and group sparsity. Pattern Recogn 47(5):1813–1825
Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 428–441
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining, pp 226–231
Everts I, van Gemert JC, Gevers T (2013) Evaluation of color STIPs for human action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2850–2857
Farnebäck G (2003) Two-frame motion estimation based on polynomial expansion. In: Image analysis. Springer, pp 363–370
Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253
Kliper-Gross O, Gurovich Y, Hassner T, Wolf L (2012) Motion interchange patterns for action recognition in unconstrained videos. In: Proceedings of European Conference on Computer Vision (ECCV), pp 256–269
Knopp J, Prasad M, Willems G, Timofte R, Van Gool L (2010) Hough transform and 3D SURF for robust three dimensional classification. In: Proceedings of European Conference on Computer Vision (ECCV), pp 589–602
Kobayashi T, Otsu N (2012) Motion recognition using local auto-correlation of space–time gradients. Pattern Recogn Lett 33(9):1188–1195
Lee H, Morariu V, Davis LS (2014) Robust pose features for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), pp 365–372
Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos “in the wild”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1996–2003
Ma S, Sigal L, Sclaroff S (2015) Space-time tree ensemble for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5024–5032
Rodriguez MD, Ahmed J, Shah M (2008) Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–8
Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: an efficient alternative to SIFT or SURF. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp 2564–2571
Schüldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings of IEEE International Conference on Pattern Recognition (ICPR), pp 32–36
Wang H, Klaser A, Schmid C, Lui C-L (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103(1):60–79
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp 3551–3558
Wang L, Suter D (2006) Informative shape representations for human action recognition. In: Proceedings of IEEE International Conference on Pattern Recognition (ICPR), pp 1266–1269
Xing D, Wang X, Lu H (2014) Action recognition using hybrid feature descriptor and VLAD video encoding. In: Asian Conference on Computer Vision Workshops, pp 99–112
Yilmaz A, Shah M (2008) A differential geometric approach to representing the human actions. Comput Vis Image Underst 109(3):335–351
Zhang Z, Hu Y, Chan S, Chia L-T (2008) Motion context: a new representation for human action recognition. In: Proceedings of European Conference on Computer Vision (ECCV), pp 817–829
Zhang Q, Wang Y, Li B (2015) Unsupervised video analysis based on a spatiotemporal saliency detector. arXiv preprint arXiv:1503.06917
Zhang H, Zhou W, Reardon C et al (2014) Simplex-based 3D spatio-temporal feature description for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2067–2074
Acknowledgments
This work was supported in part by National Natural Science Foundation of China (Grant No. 61072110) and Science and Technology Overall Innovation Project of Shaanxi Province (Grant 2013KTZB03-03-03).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tong, M., Wang, H., Tian, W. et al. Action recognition new framework with robust 3D-TCCHOGAC and 3D-HOOFGAC. Multimed Tools Appl 76, 3011–3030 (2017). https://doi.org/10.1007/s11042-016-3279-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3279-4