Abstract
Classification of video sequences is an important task with many applications in video search and action recognition. As opposed to some traditional approaches that transform original video sequences into forms of visual feature vectors, tensor-based methods have been proposed for classifying video sequences with natural representation of original data. However, one obvious limitation of tensor-based methods is that the input video sequences are often required to be preprocessed with a unified length of time. In this paper, we propose a technique for handling classification of video sequences in unequal length of time, namely Spatial-Temporal Iterative Tensor Decomposition (S-TITD) for uniform length. The proposed framework contains two primary steps. We first represent original video sequences as a third-order tensor and perform Tucker-2 decomposition to obtain the reduced-dimension core tensor. Then we encode the third order of core tensor to a uniform length by adaptively selecting the most informative slices. Notably, the above two steps are embedded into a dynamic learning framework to guarantee the proposed method has the ability of updating results over time. We conduct a series of experiments on three public datasets in gesture and action recognition, and the experimental results show that the proposed S-TITD approach achieves better performances than the state-of-the-art algorithms.
Similar content being viewed by others
References
Bellini P, Bruno I, Cenni D, Fuzier A, Nesi P, PaolucciMobile M (2015) Medicine: semantic computing management for health care applications on desktop and mobile devices. Multimed Tools Appl 58(1):41–79
Cevikalp H, Triggs B (2010) Face recognition based on image sets. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 13–18
Chen X, Yang T, Xu J (2015) Multi-gait identification based on multilinear analysis and multi-target tracking. Multimed Tools Appl. doi:10.1007/s11042-015-2585-6
Davis J, Shah M (1994) Recognizing hand gestures. In: Proceedings of IEEE European Conference on Computer Vision. Berlin Heidelberg, pp 331–340
Flórez F, García JM, García J, Hernández A (2002) Hand gesture recognition following the dynamics of a topology-preserving network. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition, pp 318–323
Hamm J, Lee DD (2008) Grassmann discriminant analysis: a unifying view on subspace-based learning. In: Proceedings of ACM International Conference on Machine Learning, pp 376–383
Harandi MT, Sanderson C, Shirazi S, Lovell BC (2011) Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2705–2712
Harandi MT, Sanderson C, Wiliem A, Lovell BC (2012) Kernel analysis over Riemannian manifolds for visual recognition of actions, pedestrians and textures. In: Proceedings of IEEE Workshop on Applications of Computer Vision, pp 433–439
Hong P, Turk M, Huang TS (2000) Gesture modeling and recognition using finite state machines. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition, pp 410–415
Hossain MS, Muhammad G (2015) Cloud-assisted speech and face recognition framework for health monitoring. Mobile Networks and Applications, pp 1–9
Hotelling H (1936) Relations between two sets of variates. Biometrika, pp 321–377
Hu W, Xie D, Fu Z, Zeng W, Maybank S (2007) Semantic-based surveillance video retrieval. IEEE Transactions on Image Processing 16(4):1168–1181
Ishihara T, Otsu N (2004) Gesture recognition using auto-regressive coefficients of higher-order local auto-correlation features. In: Proceedings of the IEEE Conference on Automatic Face and Gesture Recognition, pp 583–588
Kim TK, Cipolla R (2009) Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(8):1415–1428
Kim TK, Kittler J, Cipolla R (2007) Discriminative learning and recognition of image set classes using canonical correlations. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(6):1005–1018
Lai Z., Xu Y, Yang J, Tang J, Zhang D (2013) Sparse tensor discriminant analysis. IEEE Transactions on Image Processing 22(10):3904–3915
Liu L, Li Z, Delp EJ (2009) Efficient and low-complexity surveillance video compression using backward-channel aware Wyner-Ziv video coding. IEEE Transactions on Circuits and Systems for Video Technology 19(4):453–465
Lui YM (2012) Tangent bundles on special manifolds for action recognition. IEEE Transactions on Circuits and Systems for Video Technology 22(6):930–942
Liu Y, Wu F (2008) Multi-modality video shot clustering with tensor representation. Multimed Tools Appl 41(1):93–109
Lu H, Plataniotis KN, Venetsanopoulos AN (2008) MPCA: Multilinear principal component analysis of tensor objects. IEEE Transactions on Neural Networks 19(1):18–39
Lui YM, Beveridge JR, Kirby M (2010) Action classification on product manifolds. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 833–839
Manresa C, Perales FJ, Mas R, Varona J (2005) Hand tracking and gesture recognition for human-computer interaction. Electronic Letters on Computer Vision and Image Analysis 74(8):2687–2715
Marcel S, Bernier O, Viallet JE, Collobert D (2000) Hand gesture recognition using input-output hidden markov models. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition, pp 456–461
Nie L, Zhao Y, Akbari M, Shen J, Chua TS (2015) Bridging the vocabulary gap between health seekers and healthcare knowledge. IEEE Transactions on Knowledge and Data Engineering 27(2):396–409
Nie L, Akbari M, Li T, Chua T (2014) A joint local-global approach for medical terminology assignment. In: Proceedings of Medical Information Retrieval Workshop at SIGIR, pp 24–27
Nie L, Li T, Akbari M, Shen J, Chua TS (2014) WenZher: comprehensive vertical search for healthcare domain. In: Proceedings of the Conference on Research and Development in Information Retrieval, pp 1245–1246
Nie L, Wang M, Zhang L, Yan S, Zhang B, Chua TS (2014) Disease inference from health-related questions via sparse deep learning. IEEE Transactions on Knowledge and Data Engineering 27(8):2107–2119
Nie F, Xiang S, Song Y, Zhang C (2009) Extracting the optimal dimensionality for local tensor discriminant analysis. Pattern Recognition 42(1):105–114
Zhang L, Yang Y, Wang M, Hong R, Chua TS (2015) Beyond Doctors: Future Health Prediction from Multimedia and Multimodal Observations. In: Proceedings of ACM International Conference on Multimedia, pp 591–600
Pan P, Schonfeld D (2008) Dynamic proposal variance and optimal particle allocation in particle filtering for video tracking. IEEE Transactions on Circuits and Systems for Video Technology 18(9). doi:10.1109/TCSVT.2008.928889
Phan AH, Cichocki A (2010) Tensor decompositions for feature extraction and classification of high dimensional datasets. IEICE Nonlinear theory and its applications 1(1):37–68
Rajko S, Qian G, Ingalls T, James J (2007) Real-time gesture recognition with minimal training requirements and on-line learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Saisan P, Doretto G, Wu YN, Soatto S (2001) Dynamic texture recognition. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2:58–63
Suk HI, Sin BK, Lee SW (2008) Recognizing hand gestures using dynamic bayesian network. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition, pp 1–6
Sun J, Tao D, Faloutsos C (2006) Beyond streams and graphs: dynamic tensor analysis. In: Proceedings of ACM SIGKDD international Conference on Knowledge Discovery and Data Mining, pp 374–383
Tao J, Turjo M, Tan YP (2006) Quickest change detection for health-care video surveillance. In: Proceedings of IEEE International Symposium on Circuits and Systems
Wang SB, Quattoni A, Morency LP, Demirdjian D, Darrell T (2006) Hidden conditional random fields for gesture recognition. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2:1521–1527
Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden markov model. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 379–385
Yan R, Yang J, Hauptmann AG (2004) Learning query-class dependent weights in automatic video retrieval. In: Proceedings of the ACM International Conference on Multimedia, pp 548–555
Yang Y, Xu D, Nie F, Yan S, Zhuang Y (2010) Image clustering using local discriminant models and global integration. IEEE Transactions on Image Processing 19(10):2761–2773
Yang Y, Zhuang Y, Wu YT, Wu F, Pan YH (2008) Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Transactions on Multimedia 10(3):437–446
Zhang L, Gao Y, Hong R, Hu Y, Ji R, Dai Q (2015) Probabilistic skimlets fusion for summarizing multiple consumer landmark videos. IEEE Transactions on Multimedia 17(1):40–49
Zhang W, Lin Z, Tang X (2009) Tensor linear Laplacian discrimination (TLLD) for feature extraction. Pattern Recognition 42(9):1941–1948
Zhang X, Shi X, Hu W, Li X, Maybank S (2011) Visual tracking via dynamic tensor analysis with mean update. Neurocomputing 74(17):3277–3285
Zhang L, Xia Y, Mao K, Ma H, Shan Z (2015) An effective video summarization framework toward handheld devices. IEEE Transactions on Industrial Electronics 62(2):1309–1316
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Su, Y., Wang, H., Jing, P. et al. A spatial-temporal iterative tensor decomposition technique for action and gesture recognition. Multimed Tools Appl 76, 10635–10652 (2017). https://doi.org/10.1007/s11042-015-3090-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-3090-7