Abstract
Action videos are multidimensional data and can be naturally represented as data tensors. While tensor computing is widely used in computer vision, the geometry of tensor space is often ignored. The aim of this paper is to demonstrate the importance of the intrinsic geometry of tensor space which yields a very discriminating structure for action recognition. We characterize data tensors as points on a product manifold and model it statistically using least squares regression. To this aim, we factorize a data tensor relating to each order of the tensor using higher order singular value decomposition (HOSVD) and then impose each factorized element on a Grassmann manifold. Furthermore, we account for underlying geometry on manifolds and formulate least squares regression as a composite function. This gives a natural extension from Euclidean space to manifolds. Consequently, classification is performed using geodesic distance on a product manifold where each factor manifold is Grassmannian. Our method exploits appearance and motion without explicitly modeling the shapes and dynamics. We assess the proposed method using three gesture databases, namely the Cambridge hand-gesture, the UMD Keck body-gesture, and the CHALEARN gesture challenge data sets. Experimental results reveal that not only does the proposed method perform well on the standard benchmark data sets, but also it generalizes well on the one-shot-learning gesture challenge. Furthermore, it is based on a simple statistical model and the intrinsic geometry of tensor space.
Editor: Isabelle Guyon and Vassilis Athitsos.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In this paper, we are only interested in the field of real number \(\mathbb {R}\). Unitary groups may be considered in other contexts.
References
M.F. Abdelkadera, W. Abd-Almageeda, A. Srivastavab, R. Chellappa, Gesture and action recognition via modeling trajectories on Riemannian manifolds. Comput. Vis. Image Underst. 115(3), 439–455 (2011)
P.-A. Absil, R. Mahony, R. Sepulchre, Riemannian geometry of Grassmann manifolds with a view on algorithmic computation. Acta Appl. Math. 80(2), 199–220 (2004)
P.-A. Absil, R. Mahony, R. Sepulchre, Optimization Algorithms on Matrix Manifolds (Princeton University Press, Princeton, 2008)
E. Begelfor, M. Werman, Affine invariance revisited, in IEEE Conference on Computer Vision and Pattern Recognition, New York, 2006
J.G.E. Belinfante, B. Kolman, A Survey of Lie Groups and Lie Algebras with Applications and Computational Methods (SIAM, Philadelphia, 1972)
P. Bilinski, F. Bremond, Evaluation of local descriptors for action recognition in videos, in ICVS, 2011
A. Bissacco, A. Chiuso, Y. Ma, S. Soatto, Recognition of human gaits, in IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, 2001, pp. 270–277
Å. Björck, G.H. Golub, Numerical methods for computing angles between linear subspaces. Math. Comput. 27, 579–594 (1973)
Chalearn, Chalearn gesture dataset (cgd 2011) (Chalearn, California, 2011)
J.H. Conway, R.H. Hardin, N.J.A. Sloane, Packing lines, planes, etc.: packings in Grassmannian spaces. Exp. Math. 5(2), 139–159 (1996)
A. Datta, Y. Sheikh, T. Kanade, Modeling the product manifold of posture and motion, in Workshop on Tracking Humans for the Evaluation of their Motion in Image Sequences (in conjunction with ICCV), 2009
L. De Lathauwer, B. De Moor, J. Vandewalle, A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)
P. Dollar, V. Rabaud, G. Cottrell, S. Belongie. Behavior recognition via sparse spatio-temporal features, in IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (in conjunction with ICCV), 2005
A. Edelman, R. Arias, S. Smith, The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)
I. Guyon, V. Athitsos, P. Jangyodsuk, B. Hammer, H.J.E. Balderas, Chalearn gesture challenge: design and first results, in CVPR Workshop on Gesture Recognition, 2012
M.T. Harandi, C. Sanderson, A. Wiliem, B.C. Lovell, Kernel analysis over Riemannian manifolds for visual recognition of actions, pedestrians and textures, in WACV, 2012
T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer, New York, 2001)
Z. Jiang, Z. Lin, L. Davis, Class consistent k-means: application to face and action recognition. Comput. Vis. Image Underst. 116(6), 730–741 (2012)
H. Karcher, Riemannian center of mass and mollifier smoothing. Commun. Pure Appl. Math. 30(5), 509–541 (1977)
D. Kendall, Shape manifolds, procrustean metrics and complex projective spaces. Bull. Lond. Math. Soc. 16, 81–121 (1984)
T-K. Kim, R. Cipolla, Gesture recognition under small sample size, in Asian Conference on Computer Vision, 2007
T.-K. Kim, R. Cipolla, Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Trans. Pattern Anal. Mach. Intell. 31(8), 1415–1428 (2009)
T.G. Kolda, B.W. Bader, Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)
B. Krausz, C. Bauckhage, Action recognition in videos using nonnegative tensor factorization, in International Conference on Pattern Recognition, 2010
J. Lee, Introduction to Smooth Manifolds (Springer, New York, 2003)
V. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10, 707–710 (1966)
R. Li, R. Chellappa, Group motion segmentation using a spatio-temporal driving force model, in IEEE Conference on Computer Vision and Pattern Recognition, 2010
X. Li, W. Hu, Z. Zhang, X. Zhang, G. Luo, Robust visual tracking based on incremental tensor subspace learning, in IEEE International Conference on Computer Vision, 2007
Z. Lin, Z. Jiang, L. Davis, Recognizing actions by shape-motion prototype trees, in IEEE International Conference on Computer Vision, 2009
Y.M. Lui, Advances in matrix manifolds for computer vision. Image Vis. Comput. 30(6–7), 380–388 (2012a)
Y.M. Lui, Tangent bundles on special manifolds for action recognition. IEEE Trans. Circ. Syst. Video Technol. 22(6), 930–942 (2012b)
Y.M. Lui, J.R. Beveridge, Grassmann registration manifolds for face recognition. in European Conference on Computer Vision, Marseille, France, 2008
Y.M. Lui, J.R. Beveridge, M. Kirby, Canonical Stiefel quotient and its application to generic face recognition in illumination spaces, in IEEE International Conference on Biometrics: Theory, Applications and Systems, Washington, DC, 2009
Y.M. Lui, J.R. Beveridge, M. Kirby, Action classification on product manifolds, in IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, 2010
Y. Ma, J. Kos̆ecká, S. Sastry. Optimal motion from image sequences: a Riemannian viewpoint, Technical Report No. UCB/ERL M98/37, EECS Department, University of California, Berkeley, 1998
S. Mitra, T. Acharya, Gesture recognition: a survey. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 37, 311–324 (2007)
Q. Qiu, Z. Jiang, R. Chellappa, Sparse dictionary-based representation and recognition of action attributes, in IEEE Conference on Computer Vision and Pattern Recognition, 2011
M. Rodriguez, J. Ahmed, M. Shah, Action mach: a spatio-temporal maximum average correlation height filter for action recognition, in IEEE Conference on Computer Vision and Pattern Recognition, 2008
P. Saisan, G. Doretto, Y-N. Wu, S. Soatto, Dynamic texture recognition, in IEEE Conference on Computer Vision and Pattern Recognition, 2001
P. Turaga, R. Chellappa, Locally time-invariant models of human activities using trajectories on the Grassmannian, in IEEE Conference on Computer Vision and Pattern Recognition, 2009
P. Turaga, S. Biswas, R. Chellappa, The role of geometry for age estimation. in IEEE International Conference Acoustics, Speech and Signal Processing, 2010
M.A.O. Vasilescu, Human motion signatures: analysis, synthesis, recognition, in International Conference on Pattern Recognition, Quebec City, Canada, 2002, pp. 456–460
M.A.O. Vasilescu, D. Terzopoulos, Multilinear image analysis for facial recognition, in International Conference on Pattern Recognition, Quebec City, Canada, 2002, pp. 511–514
A. Veeraraghavan, A.K. Roy-Chowdhury, R. Chellappa, Matching shape sequences in video with applications in human movement analysis. IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1896–1909 (2005)
H. Wang, M. Ullah, A Klaser, I. Laptev, C. Schmid, Evaulation of local spatio-temporal features for action recognition, in British Machine Vision Conference, 2009
D. Weinland, R. Ronfard, E. Boyer, Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 104, 249–257 (2006)
Y. Yuan, H. Zheng, Z. Li, D. Zhang, Video action recognition with spatio-temporal graph embedding and spline modeling, in ICASSP, 2010
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Lui, Y.M. (2017). Human Gesture Recognition on Product Manifolds. In: Escalera, S., Guyon, I., Athitsos, V. (eds) Gesture Recognition. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-57021-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-57021-1_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57020-4
Online ISBN: 978-3-319-57021-1
eBook Packages: Computer ScienceComputer Science (R0)