A spatial-temporal iterative tensor decomposition technique for action and gesture recognition

Su, Yuting; Wang, Haiyi; Jing, Peiguang; Xu, Chuanzhong

doi:10.1007/s11042-015-3090-7

A spatial-temporal iterative tensor decomposition technique for action and gesture recognition

Published: 16 December 2015

Volume 76, pages 10635–10652, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yuting Su¹,
Haiyi Wang¹,
Peiguang Jing¹ &
…
Chuanzhong Xu¹

562 Accesses
7 Citations
Explore all metrics

Abstract

Classification of video sequences is an important task with many applications in video search and action recognition. As opposed to some traditional approaches that transform original video sequences into forms of visual feature vectors, tensor-based methods have been proposed for classifying video sequences with natural representation of original data. However, one obvious limitation of tensor-based methods is that the input video sequences are often required to be preprocessed with a unified length of time. In this paper, we propose a technique for handling classification of video sequences in unequal length of time, namely Spatial-Temporal Iterative Tensor Decomposition (S-TITD) for uniform length. The proposed framework contains two primary steps. We first represent original video sequences as a third-order tensor and perform Tucker-2 decomposition to obtain the reduced-dimension core tensor. Then we encode the third order of core tensor to a uniform length by adaptively selecting the most informative slices. Notably, the above two steps are embedded into a dynamic learning framework to guarantee the proposed method has the ability of updating results over time. We conduct a series of experiments on three public datasets in gesture and action recognition, and the experimental results show that the proposed S-TITD approach achieves better performances than the state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video-based human action and hand gesture recognition by fusing factored matrices of dual tensors

Article 11 March 2016

Tucker decomposition-based tensor learning for human action recognition

Article 08 April 2015

Action Recognition Using Canonical Correlation Kernels

References

Bellini P, Bruno I, Cenni D, Fuzier A, Nesi P, PaolucciMobile M (2015) Medicine: semantic computing management for health care applications on desktop and mobile devices. Multimed Tools Appl 58(1):41–79
Article Google Scholar
Cevikalp H, Triggs B (2010) Face recognition based on image sets. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 13–18
Chen X, Yang T, Xu J (2015) Multi-gait identification based on multilinear analysis and multi-target tracking. Multimed Tools Appl. doi:10.1007/s11042-015-2585-6
Google Scholar
Davis J, Shah M (1994) Recognizing hand gestures. In: Proceedings of IEEE European Conference on Computer Vision. Berlin Heidelberg, pp 331–340
Flórez F, García JM, García J, Hernández A (2002) Hand gesture recognition following the dynamics of a topology-preserving network. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition, pp 318–323
Hamm J, Lee DD (2008) Grassmann discriminant analysis: a unifying view on subspace-based learning. In: Proceedings of ACM International Conference on Machine Learning, pp 376–383
Harandi MT, Sanderson C, Shirazi S, Lovell BC (2011) Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2705–2712
Harandi MT, Sanderson C, Wiliem A, Lovell BC (2012) Kernel analysis over Riemannian manifolds for visual recognition of actions, pedestrians and textures. In: Proceedings of IEEE Workshop on Applications of Computer Vision, pp 433–439
Hong P, Turk M, Huang TS (2000) Gesture modeling and recognition using finite state machines. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition, pp 410–415
Hossain MS, Muhammad G (2015) Cloud-assisted speech and face recognition framework for health monitoring. Mobile Networks and Applications, pp 1–9
Hotelling H (1936) Relations between two sets of variates. Biometrika, pp 321–377
Hu W, Xie D, Fu Z, Zeng W, Maybank S (2007) Semantic-based surveillance video retrieval. IEEE Transactions on Image Processing 16(4):1168–1181
Article MathSciNet Google Scholar
Ishihara T, Otsu N (2004) Gesture recognition using auto-regressive coefficients of higher-order local auto-correlation features. In: Proceedings of the IEEE Conference on Automatic Face and Gesture Recognition, pp 583–588
Kim TK, Cipolla R (2009) Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(8):1415–1428
Article Google Scholar
Kim TK, Kittler J, Cipolla R (2007) Discriminative learning and recognition of image set classes using canonical correlations. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(6):1005–1018
Lai Z., Xu Y, Yang J, Tang J, Zhang D (2013) Sparse tensor discriminant analysis. IEEE Transactions on Image Processing 22(10):3904–3915
Article MathSciNet Google Scholar
Liu L, Li Z, Delp EJ (2009) Efficient and low-complexity surveillance video compression using backward-channel aware Wyner-Ziv video coding. IEEE Transactions on Circuits and Systems for Video Technology 19(4):453–465
Article Google Scholar
Lui YM (2012) Tangent bundles on special manifolds for action recognition. IEEE Transactions on Circuits and Systems for Video Technology 22(6):930–942
Article Google Scholar
Liu Y, Wu F (2008) Multi-modality video shot clustering with tensor representation. Multimed Tools Appl 41(1):93–109
Article Google Scholar
Lu H, Plataniotis KN, Venetsanopoulos AN (2008) MPCA: Multilinear principal component analysis of tensor objects. IEEE Transactions on Neural Networks 19(1):18–39
Lui YM, Beveridge JR, Kirby M (2010) Action classification on product manifolds. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 833–839
Manresa C, Perales FJ, Mas R, Varona J (2005) Hand tracking and gesture recognition for human-computer interaction. Electronic Letters on Computer Vision and Image Analysis 74(8):2687–2715
Marcel S, Bernier O, Viallet JE, Collobert D (2000) Hand gesture recognition using input-output hidden markov models. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition, pp 456–461
Nie L, Zhao Y, Akbari M, Shen J, Chua TS (2015) Bridging the vocabulary gap between health seekers and healthcare knowledge. IEEE Transactions on Knowledge and Data Engineering 27(2):396–409
Nie L, Akbari M, Li T, Chua T (2014) A joint local-global approach for medical terminology assignment. In: Proceedings of Medical Information Retrieval Workshop at SIGIR, pp 24–27
Nie L, Li T, Akbari M, Shen J, Chua TS (2014) WenZher: comprehensive vertical search for healthcare domain. In: Proceedings of the Conference on Research and Development in Information Retrieval, pp 1245–1246
Nie L, Wang M, Zhang L, Yan S, Zhang B, Chua TS (2014) Disease inference from health-related questions via sparse deep learning. IEEE Transactions on Knowledge and Data Engineering 27(8):2107–2119
Nie F, Xiang S, Song Y, Zhang C (2009) Extracting the optimal dimensionality for local tensor discriminant analysis. Pattern Recognition 42(1):105–114
Zhang L, Yang Y, Wang M, Hong R, Chua TS (2015) Beyond Doctors: Future Health Prediction from Multimedia and Multimodal Observations. In: Proceedings of ACM International Conference on Multimedia, pp 591–600
Pan P, Schonfeld D (2008) Dynamic proposal variance and optimal particle allocation in particle filtering for video tracking. IEEE Transactions on Circuits and Systems for Video Technology 18(9). doi:10.1109/TCSVT.2008.928889
Phan AH, Cichocki A (2010) Tensor decompositions for feature extraction and classification of high dimensional datasets. IEICE Nonlinear theory and its applications 1(1):37–68
Rajko S, Qian G, Ingalls T, James J (2007) Real-time gesture recognition with minimal training requirements and on-line learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Saisan P, Doretto G, Wu YN, Soatto S (2001) Dynamic texture recognition. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2:58–63
Suk HI, Sin BK, Lee SW (2008) Recognizing hand gestures using dynamic bayesian network. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition, pp 1–6
Sun J, Tao D, Faloutsos C (2006) Beyond streams and graphs: dynamic tensor analysis. In: Proceedings of ACM SIGKDD international Conference on Knowledge Discovery and Data Mining, pp 374–383
Tao J, Turjo M, Tan YP (2006) Quickest change detection for health-care video surveillance. In: Proceedings of IEEE International Symposium on Circuits and Systems
Wang SB, Quattoni A, Morency LP, Demirdjian D, Darrell T (2006) Hidden conditional random fields for gesture recognition. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2:1521–1527
Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden markov model. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 379–385
Yan R, Yang J, Hauptmann AG (2004) Learning query-class dependent weights in automatic video retrieval. In: Proceedings of the ACM International Conference on Multimedia, pp 548–555
Yang Y, Xu D, Nie F, Yan S, Zhuang Y (2010) Image clustering using local discriminant models and global integration. IEEE Transactions on Image Processing 19(10):2761–2773
Article MathSciNet Google Scholar
Yang Y, Zhuang Y, Wu YT, Wu F, Pan YH (2008) Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Transactions on Multimedia 10(3):437–446
Article Google Scholar
Zhang L, Gao Y, Hong R, Hu Y, Ji R, Dai Q (2015) Probabilistic skimlets fusion for summarizing multiple consumer landmark videos. IEEE Transactions on Multimedia 17(1):40–49
Article Google Scholar
Zhang W, Lin Z, Tang X (2009) Tensor linear Laplacian discrimination (TLLD) for feature extraction. Pattern Recognition 42(9):1941–1948
Zhang X, Shi X, Hu W, Li X, Maybank S (2011) Visual tracking via dynamic tensor analysis with mean update. Neurocomputing 74(17):3277–3285
Zhang L, Xia Y, Mao K, Ma H, Shan Z (2015) An effective video summarization framework toward handheld devices. IEEE Transactions on Industrial Electronics 62(2):1309–1316
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronic Information Engineering, Tianjin University, Tianjin, China
Yuting Su, Haiyi Wang, Peiguang Jing & Chuanzhong Xu

Authors

Yuting Su
View author publications
You can also search for this author in PubMed Google Scholar
Haiyi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Peiguang Jing
View author publications
You can also search for this author in PubMed Google Scholar
Chuanzhong Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peiguang Jing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Su, Y., Wang, H., Jing, P. et al. A spatial-temporal iterative tensor decomposition technique for action and gesture recognition. Multimed Tools Appl 76, 10635–10652 (2017). https://doi.org/10.1007/s11042-015-3090-7

Download citation

Received: 28 July 2015
Revised: 19 October 2015
Accepted: 17 November 2015
Published: 16 December 2015
Issue Date: April 2017
DOI: https://doi.org/10.1007/s11042-015-3090-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A spatial-temporal iterative tensor decomposition technique for action and gesture recognition

Abstract

Access this article

Similar content being viewed by others

Video-based human action and hand gesture recognition by fusing factored matrices of dual tensors

Tucker decomposition-based tensor learning for human action recognition

Action Recognition Using Canonical Correlation Kernels

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A spatial-temporal iterative tensor decomposition technique for action and gesture recognition

Abstract

Access this article

Similar content being viewed by others

Video-based human action and hand gesture recognition by fusing factored matrices of dual tensors

Tucker decomposition-based tensor learning for human action recognition

Action Recognition Using Canonical Correlation Kernels

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation