Abstract
The aim of this paper is to introduce a new descriptor for the spatio-temporal volume (STV). Human motion is completely represented by STV (action volume) which is constructed over successive frames by stacking human silhouettes in consecutive frames. Action volume comprehensively contains spatial and temporal information about an action. The main contribution of this paper is to propose a new affine invariant action volume descriptor based on a function of spherical harmonic coefficients. This means, it is invariant under rotation, non-uniform scaling and translation. In the 3D shape analysis literature, there have been a few attempts to use coefficients of spherical harmonics to describe a 3D shape. However, those descriptors are not affine invariant and they are only rotation invariant. In addition, the proposed approach employs a parametric form of spherical harmonics that handles genus zero surfaces regardless of whether they are stellar or not. Another contribution of this paper is the way that action volume is constructed. We applied the proposed descriptor to the KTH, Weizmann, IXMAS and Robust datasets and compared the performance of our algorithm to competing methods available in the literature. The results of our experiments show that our method has a comparable performance to the most successful and recent existing algorithms.








Similar content being viewed by others
References
Poppe RW (2010) A survey on vision-based human action recognition. Image Vis Comput 28:976–990
Adelson EH, Bergen JR (1985) Spatiotemporal energy models for the perception of motion. J Opt Soc Am 2:284–299
Yilmaz A, Shah M (2005) Actions sketch: a novel action representation. In: Computer Vision and Pattern Recognition (CVPR). Washington, USA
Yan P, Khan SM, Shah M (2008) Learning 4D action feature model for arbitrary view action recognition. In: Computer Vision and Pattern Recognition (CVPR)
Blank M, Gorelick L, Shechtman L, Irani M, Basri R (2005) Actions as space-time shapes. In: International Conference on Computer Vision (ICCV’05), Beijing, China, pp 1395–1402
Gorelick L, Galun M, Sharon E, Brandt A, Basri R (2006) Shape representation and classification using the Poisson equation. IEEE Trans Pattern Anal Mach Intell 28:1–29
Ali S, Shah M (2010) Human action recognition in videos using kinematics features and multiple instance learning. IEEE Trans Pattern Anal Mach Intell (PAMI) 32:288–303
Weinland D, Ronfard R, Boyer E (2006) Free viewpoint action recognition using motion history volumes. Comput Vis Image underst (CVIU) 104:249–257
Danafar S, Gheissari N (2007) Action recognition for surveillance applications using optic flow and SVM. In: Asian Conference on Computer Vision (ACCV’07), Tokyo, Japan, pp 457–466
Lin Z, Jiang Z, Davis LS (2009) Recognizing actions by shape-motion prototype trees. In: International Conference on Computer Vision (ICCV)
Lui YM, Beveridge JR, Kirby M (2010) Action classification on product manifolds. In: Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, pp 833–839
Laptev I, Lindeberg T (2003) Space–time interest points. In: International Conference on Computer Vision (ICCV’03), Nice, France, pp 432–439
Harris C, Stephens M (1988) A combined corner and edge detector. In: Alvey Vision Conference, Manchester, UK, pp 147–151
Laptev I, Caputo B, Schüldt C, Lindeberg T (2007) Local velocity-adapted motion events for spatio-temporal recognition. Comput Vis Image Underst (CVIU) 108:207–229
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Computer Vision and Pattern Recognition (CVPR’08), Anchorage, AK, pp 1–8
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, pp 886–893
Marszalek M, Laptev I, Schmid C (2009) Actions in context. In: Computer Vision and Pattern Recognition (CVPR’09), Miami, FL, pp 1–8
Niebles JC, Wang H, Fei-fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis (IJCV) 79:299–318
Fathi A, Mori G (2008) Action recognition by learning mid-level motion features. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–8
Kazhdan M, Funkhouser T, Rusinkiewicz S (2003) Rotation invariant spherical harmonic representation of 3D shape descriptors. In: Eurographics/ACM SIGGRAPH symposium on Geometry, Aachen, Germany, pp 156–164
Chung MK, Dalton KM, Shen L, Evans AC, Davidson RJ (2007) Weighted Fourier series representation and its application to quantifying the amount of gray matter. In: IEEE Transactions on Medical Imaging, pp 566–581
Chung MK, Hartley R, Dalton KM, Davidson RJ (2008) Encoding cortical surface by spherical harmonics. In: Satistica Sinica, pp 1269–1291
Holte MB, Moeslund TB, Fihl P (2010) View-invariant gesture recognition using 3D optical flow and harmonic motion context. In: Computer Vision and Image Understanding (CVIU), vol 114
Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell (PAMI) 24:509–522
Zeng W, Samaras D, Gu D (2010) Ricci Flow for 3D Shape Analysis. IEEE Trans Pattern Anal Mach Intell (PAMI) 32:662–677
Schönefeld V (2004) Spherical harmonics
Duncan BS, Olson AJ (1993) Approximation and characterization of molecular surfaces. Biopolymers 33:219–229
Brechbuhler C, Gerig G, Kuebler O (1995) Parametrization of closed surfaces for 3-D shape description. Comput Vis Image Underst (CVIU) 61:154–170
Khairy K, Howard J (2008) Spherical harmonics-based parametric deconvolution of 3D surface images using bending energy minimization. Med Image Anal 12:217–227. http://www.sciencedirect.com/science/article/pii/S1361841507001016
Morris RJ, Najmanovich RJ, Kahraman A, Thornton JM (2005) Real spherical harmonic expansion coefficients as 3D shape descriptors for protein binding pocket and ligand comparisons. Bioinformatics 21:2347–2355
Wood Z, Hoppe H, Desbrun M, Schröder P (2002) Isosurface topology simplification. In: SIGGRAPH
Guskov I, Wood Z (2001) Topological noise removal. Graphics. In: Interface, pp 19–26
Shattuck DW, Leahy RM (2001) Automated graph based analysis and correction of cortical volume topology. In: IEEE Transaction on Medical Imaging
El-Sana J, Varshney A (1997) Controlled simplification of genus for polygonal models. In: Visualization, pp 403–412
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: International Conference on Pattern Recognition (ICPR’04), Cambridge, UK, pp 32–36
Madzarov G, Gjorgjevikj D, Chorbev I (2009) A multi-class SVM classifier utilizing binary decision tree. Informatica 33:233–241
Liu J, Shah M, Kuipers B, Savarese S (2011) Cross-view action recognition via view knowledge transfer. In: Computer Vision and Pattern Recognition (CVPR), Colorado, USA
Junejo IN, Dexter E, Laptev I, Perez P (2008) Cross-view action recognition from temporal self-similarities. In: European Conference on Computer Vision (ECCV), Berlin, Heidelberg
Liu J, Ali S, Shah M (2008) Recognizing human actions using multiple features. In: Computer Vision and Pattern Recognition (CVPR), Colorado, USA
Author information
Authors and Affiliations
Corresponding author
Additional information
M. Palhang and N. Gheissari contributed equally to this paper.
Rights and permissions
About this article
Cite this article
Razzaghi, P., Palhang, M. & Gheissari, N. A new invariant descriptor for action recognition based on spherical harmonics. Pattern Anal Applic 16, 507–518 (2013). https://doi.org/10.1007/s10044-012-0274-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-012-0274-x