Skip to main content
Log in

A spatial-temporal iterative tensor decomposition technique for action and gesture recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Classification of video sequences is an important task with many applications in video search and action recognition. As opposed to some traditional approaches that transform original video sequences into forms of visual feature vectors, tensor-based methods have been proposed for classifying video sequences with natural representation of original data. However, one obvious limitation of tensor-based methods is that the input video sequences are often required to be preprocessed with a unified length of time. In this paper, we propose a technique for handling classification of video sequences in unequal length of time, namely Spatial-Temporal Iterative Tensor Decomposition (S-TITD) for uniform length. The proposed framework contains two primary steps. We first represent original video sequences as a third-order tensor and perform Tucker-2 decomposition to obtain the reduced-dimension core tensor. Then we encode the third order of core tensor to a uniform length by adaptively selecting the most informative slices. Notably, the above two steps are embedded into a dynamic learning framework to guarantee the proposed method has the ability of updating results over time. We conduct a series of experiments on three public datasets in gesture and action recognition, and the experimental results show that the proposed S-TITD approach achieves better performances than the state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Bellini P, Bruno I, Cenni D, Fuzier A, Nesi P, PaolucciMobile M (2015) Medicine: semantic computing management for health care applications on desktop and mobile devices. Multimed Tools Appl 58(1):41–79

    Article  Google Scholar 

  2. Cevikalp H, Triggs B (2010) Face recognition based on image sets. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 13–18

  3. Chen X, Yang T, Xu J (2015) Multi-gait identification based on multilinear analysis and multi-target tracking. Multimed Tools Appl. doi:10.1007/s11042-015-2585-6

    Google Scholar 

  4. Davis J, Shah M (1994) Recognizing hand gestures. In: Proceedings of IEEE European Conference on Computer Vision. Berlin Heidelberg, pp 331–340

  5. Flórez F, García JM, García J, Hernández A (2002) Hand gesture recognition following the dynamics of a topology-preserving network. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition, pp 318–323

  6. Hamm J, Lee DD (2008) Grassmann discriminant analysis: a unifying view on subspace-based learning. In: Proceedings of ACM International Conference on Machine Learning, pp 376–383

  7. Harandi MT, Sanderson C, Shirazi S, Lovell BC (2011) Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2705–2712

  8. Harandi MT, Sanderson C, Wiliem A, Lovell BC (2012) Kernel analysis over Riemannian manifolds for visual recognition of actions, pedestrians and textures. In: Proceedings of IEEE Workshop on Applications of Computer Vision, pp 433–439

  9. Hong P, Turk M, Huang TS (2000) Gesture modeling and recognition using finite state machines. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition, pp 410–415

  10. Hossain MS, Muhammad G (2015) Cloud-assisted speech and face recognition framework for health monitoring. Mobile Networks and Applications, pp 1–9

  11. Hotelling H (1936) Relations between two sets of variates. Biometrika, pp 321–377

  12. Hu W, Xie D, Fu Z, Zeng W, Maybank S (2007) Semantic-based surveillance video retrieval. IEEE Transactions on Image Processing 16(4):1168–1181

    Article  MathSciNet  Google Scholar 

  13. Ishihara T, Otsu N (2004) Gesture recognition using auto-regressive coefficients of higher-order local auto-correlation features. In: Proceedings of the IEEE Conference on Automatic Face and Gesture Recognition, pp 583–588

  14. Kim TK, Cipolla R (2009) Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(8):1415–1428

    Article  Google Scholar 

  15. Kim TK, Kittler J, Cipolla R (2007) Discriminative learning and recognition of image set classes using canonical correlations. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(6):1005–1018

  16. Lai Z., Xu Y, Yang J, Tang J, Zhang D (2013) Sparse tensor discriminant analysis. IEEE Transactions on Image Processing 22(10):3904–3915

    Article  MathSciNet  Google Scholar 

  17. Liu L, Li Z, Delp EJ (2009) Efficient and low-complexity surveillance video compression using backward-channel aware Wyner-Ziv video coding. IEEE Transactions on Circuits and Systems for Video Technology 19(4):453–465

    Article  Google Scholar 

  18. Lui YM (2012) Tangent bundles on special manifolds for action recognition. IEEE Transactions on Circuits and Systems for Video Technology 22(6):930–942

    Article  Google Scholar 

  19. Liu Y, Wu F (2008) Multi-modality video shot clustering with tensor representation. Multimed Tools Appl 41(1):93–109

    Article  Google Scholar 

  20. Lu H, Plataniotis KN, Venetsanopoulos AN (2008) MPCA: Multilinear principal component analysis of tensor objects. IEEE Transactions on Neural Networks 19(1):18–39

  21. Lui YM, Beveridge JR, Kirby M (2010) Action classification on product manifolds. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 833–839

  22. Manresa C, Perales FJ, Mas R, Varona J (2005) Hand tracking and gesture recognition for human-computer interaction. Electronic Letters on Computer Vision and Image Analysis 74(8):2687–2715

  23. Marcel S, Bernier O, Viallet JE, Collobert D (2000) Hand gesture recognition using input-output hidden markov models. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition, pp 456–461

  24. Nie L, Zhao Y, Akbari M, Shen J, Chua TS (2015) Bridging the vocabulary gap between health seekers and healthcare knowledge. IEEE Transactions on Knowledge and Data Engineering 27(2):396–409

  25. Nie L, Akbari M, Li T, Chua T (2014) A joint local-global approach for medical terminology assignment. In: Proceedings of Medical Information Retrieval Workshop at SIGIR, pp 24–27

  26. Nie L, Li T, Akbari M, Shen J, Chua TS (2014) WenZher: comprehensive vertical search for healthcare domain. In: Proceedings of the Conference on Research and Development in Information Retrieval, pp 1245–1246

  27. Nie L, Wang M, Zhang L, Yan S, Zhang B, Chua TS (2014) Disease inference from health-related questions via sparse deep learning. IEEE Transactions on Knowledge and Data Engineering 27(8):2107–2119

  28. Nie F, Xiang S, Song Y, Zhang C (2009) Extracting the optimal dimensionality for local tensor discriminant analysis. Pattern Recognition 42(1):105–114

  29. Zhang L, Yang Y, Wang M, Hong R, Chua TS (2015) Beyond Doctors: Future Health Prediction from Multimedia and Multimodal Observations. In: Proceedings of ACM International Conference on Multimedia, pp 591–600

  30. Pan P, Schonfeld D (2008) Dynamic proposal variance and optimal particle allocation in particle filtering for video tracking. IEEE Transactions on Circuits and Systems for Video Technology 18(9). doi:10.1109/TCSVT.2008.928889

  31. Phan AH, Cichocki A (2010) Tensor decompositions for feature extraction and classification of high dimensional datasets. IEICE Nonlinear theory and its applications 1(1):37–68

  32. Rajko S, Qian G, Ingalls T, James J (2007) Real-time gesture recognition with minimal training requirements and on-line learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8

  33. Saisan P, Doretto G, Wu YN, Soatto S (2001) Dynamic texture recognition. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2:58–63

  34. Suk HI, Sin BK, Lee SW (2008) Recognizing hand gestures using dynamic bayesian network. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition, pp 1–6

  35. Sun J, Tao D, Faloutsos C (2006) Beyond streams and graphs: dynamic tensor analysis. In: Proceedings of ACM SIGKDD international Conference on Knowledge Discovery and Data Mining, pp 374–383

  36. Tao J, Turjo M, Tan YP (2006) Quickest change detection for health-care video surveillance. In: Proceedings of IEEE International Symposium on Circuits and Systems

  37. Wang SB, Quattoni A, Morency LP, Demirdjian D, Darrell T (2006) Hidden conditional random fields for gesture recognition. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2:1521–1527

  38. Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden markov model. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 379–385

  39. Yan R, Yang J, Hauptmann AG (2004) Learning query-class dependent weights in automatic video retrieval. In: Proceedings of the ACM International Conference on Multimedia, pp 548–555

  40. Yang Y, Xu D, Nie F, Yan S, Zhuang Y (2010) Image clustering using local discriminant models and global integration. IEEE Transactions on Image Processing 19(10):2761–2773

    Article  MathSciNet  Google Scholar 

  41. Yang Y, Zhuang Y, Wu YT, Wu F, Pan YH (2008) Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Transactions on Multimedia 10(3):437–446

    Article  Google Scholar 

  42. Zhang L, Gao Y, Hong R, Hu Y, Ji R, Dai Q (2015) Probabilistic skimlets fusion for summarizing multiple consumer landmark videos. IEEE Transactions on Multimedia 17(1):40–49

    Article  Google Scholar 

  43. Zhang W, Lin Z, Tang X (2009) Tensor linear Laplacian discrimination (TLLD) for feature extraction. Pattern Recognition 42(9):1941–1948

  44. Zhang X, Shi X, Hu W, Li X, Maybank S (2011) Visual tracking via dynamic tensor analysis with mean update. Neurocomputing 74(17):3277–3285

  45. Zhang L, Xia Y, Mao K, Ma H, Shan Z (2015) An effective video summarization framework toward handheld devices. IEEE Transactions on Industrial Electronics 62(2):1309–1316

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peiguang Jing.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Su, Y., Wang, H., Jing, P. et al. A spatial-temporal iterative tensor decomposition technique for action and gesture recognition. Multimed Tools Appl 76, 10635–10652 (2017). https://doi.org/10.1007/s11042-015-3090-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-3090-7

Keywords

Navigation