Hierarchical Gaussian descriptor based on local pooling for action recognition

Nguyen, Xuan Son; Mouaddib, Abdel-Illah; Nguyen, Thanh Phuong

doi:10.1007/s00138-018-0989-9

Hierarchical Gaussian descriptor based on local pooling for action recognition

Original paper
Published: 12 November 2018

Volume 30, pages 321–343, (2019)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Xuan Son Nguyen ORCID: orcid.org/0000-0002-2776-2254¹,
Abdel-Illah Mouaddib¹ &
Thanh Phuong Nguyen^2,3

440 Accesses
4 Citations
Explore all metrics

Abstract

In this paper, we propose a new approach based on Gaussian descriptors for action recognition. We first develop a feature representation technique that encodes high-order statistics of local features in two levels, where single Gaussians are used to capture the distributions involved. To deal with the possible loss of information about the distribution of features caused by heterogeneous feature vectors when summarizing them, we use K-means clustering and Sparse Coding to construct some sets of feature vectors over which the summarization is performed. We then present two methods based on depth images and pose data for action recognition. In both methods, the proposed feature representation technique is applied to effectively obtain discriminative action descriptors. Experimental evaluation on the seven benchmark datasets, i.e., MSRAction3D, MSRGesture3D, DHA, SKIG, Florence, UTKinect, and HDM05, shows that our methods achieve very promising results on all the datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human Action Recognition by Random Features and Hand-Crafted Features: A Comparative Study

Action recognition in depth videos using hierarchical gaussian descriptor

Article 13 January 2018

Action Recognition Using Super Sparse Coding Vector with Spatio-temporal Awareness

Notes

In our experimental settings, joint j has only one or two neighbors.
We fix the codebook size \(K=50\) and evaluate SC-IELLogE-IELLogE on MSRAction3D dataset with all the possible combinations of D, \(D_1\), and \(D_2\), where \(D=1,2,3\), \(D_1=-4,-3,-2,-1,1,2,3,4\), \(D_2=-4,-3,-2,-1,1,2,3,4\).
To select the most appropriate value of K, we evaluate all methods with \(K=50,100,150,200,250\).
Our experiments are conducted with \(K=50,100,150,200,250\).
We evaluate these methods with \(K=50,100,150,200,250\) on Florence and UTKinect datasets, and with \(K=100,200,300,400,500,600,700\) on HDM05 dataset.

References

Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application to face recognition. TPAMI 28(12), 2037–2041 (2006)
Article MATH Google Scholar
Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Geometric means in a novel vector space structure on symmetric positive definite matrices. SIAM J. Matrix Anal. Appl. 29(1), 328–347 (2007)
Article MathSciNet MATH Google Scholar
Bilinski, P., Bremond, F.: Video covariance matrix logarithm for human action recognition in videos. In: IJCAI, pp. 2140–2147 (2015)
Boureau, Y.L., Roux, N.L., Bach, F., Ponce, J., LeCun, Y.: Ask the locals: multi-way local pooling for image recognition. In: ICCV, pp. 2651–2658 (2011)
Cao, Z., Simon, T., Wei, S., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR, pp. 1302–1310 (2017)
Cavazza, J., Zunino, A., Biagio, M.S., Murino, V.: Kernelized covariance for action recognition. In: ICPR, pp. 408–413 (2016)
Chen, C., Jafari, R., Kehtarnavaz, N.: Action recognition from depth sequences using depth motion maps-based local binary patterns. In: WACV, pp. 1092–1099 (2015)
Chen, C., Liu, K., Kehtarnavaz, N.: Real-time human action recognition based on depth motion maps. J. Real Time Image Process. 12(1), 155–163 (2016)
Article Google Scholar
Cirujeda, P., Binefa, X.: 4DCov: a nested covariance descriptor of spatio-temporal features for gesture recognition in depth sequences. In: 3DV, vol. 1, pp. 657–664 (2014)
Coates, A., Ng, A.Y.: The importance of encoding versus training with sparse coding and vector quantization. In: ICML, pp. 921–928 (2011)
Davis, L.S.: Covariance discriminative learning: a natural and efficient approach to image set classification. In: CVPR, pp. 2496–2503 (2012)
Devanne, M., Wannous, H., Berretti, S., Pala, P., Daoudi, M., Bimbo, A.D.: 3-d human action recognition by shape analysis of motion trajectories on Riemannian manifold. IEEE Trans. Cybern. 45(7), 1340–1352 (2015)
Article Google Scholar
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: CVPR, pp. 1110–1118 (2015)
Evangelidis, G., Singh, G., Horaud, R.: Skeletal quads: human action recognition using joint quadruples. In: ICPR, pp. 4513–4518 (2014)
Fan, K.C., Hung, T.Y.: A novel local pattern descriptor—local vector pattern in high-order derivative space for face recognition. IEEE Trans. Image Process. 23, 2877–2891 (2014)
Article MathSciNet MATH Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Gao, Z., Zhang, H., Xu, G., Xue, Y.: Multi-perspective and multi-modality joint representation and recognition model for 3D action recognition. Neurocomputing 151, 554–564 (2015)
Article Google Scholar
Gong, L., Wang, T., Liu, F.: Shape of Gaussians as feature descriptors. In: CVPR, pp. 2366–2371 (2009)
Gowayyed, M.A., Torki, M., Hussein, M.E., El-Saban, M.: Histogram of oriented displacements (HOD): describing trajectories of human joints for action recognition. In: IJCAI, pp. 1351–1357 (2013)
Guo, K., Ishwar, P., Konrad, J.: Action recognition from video using feature covariance matrices. IEEE Trans. Image Process. 22(6), 2479–2494 (2013)
Article MathSciNet MATH Google Scholar
Harandi, M.T., Salzmann, M., Hartley, R.: From manifold to manifold: geometry-aware dimensionality reduction for SPD matrices. In: ECCV, pp. 17–32 (2014)
Harandi, M.T., Sanderson, C., Sanin, A., Lovell, B.C.: Spatio-temporal covariance descriptors for action and gesture recognition. In: WACV, pp. 103–110 (2013)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Holte, M.B., Moeslund, T.B., Fihl, P.: View-invariant gesture recognition using 3D optical flow and harmonic motion context. CVIU 114(12), 1353–1361 (2010)
Google Scholar
Huang, Z., Gool, L.V.: A Riemannian network for SPD matrix learning. In: AAAI, pp. 2036–2042 (2017)
Huang, Z., Wan, C., Probst, T., Gool, L.V.: Deep learning on Lie groups for skeleton-based action recognition. In: CVPR (2017)
Huang, Z., Wu, J., Gool, L.V.: Building deep networks on Grassmann manifolds. In: AAAI (2018)
Hussein, M.E., Torki, M., Gowayyed, M.A., El-Saban, M.: Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: IJCAI, pp. 2466–2472 (2013)
Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the best multi-stage architecture for object recognition? In: ICCV, pp. 2146–2153 (2009)
Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106(4), 620–630 (1957)
Article MathSciNet MATH Google Scholar
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR, pp. 3304–3311 (2010)
Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC, pp. 1–10 (2008)
Kurakin, A., Zhang, Z., Liu, Z.: A real time system for dynamic hand gesture recognition with a depth sensor. In: EUSIPCO, pp. 1975–1979 (2012)
Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: ICML, pp. 609–616 (2009)
Li, P., Wang, Q.: Local log-Euclidean covariance matrix (L2ECM) for image representation and its applications. In: ECCV, pp. 469–482 (2012)
Li, P., Wang, Q., Zeng, H., Zhang, L.: Local log-Euclidean multivariate Gaussian descriptor and its application to image classification. TPAMI 39(4), 803–817 (2017)
Article Google Scholar
Li, P., Zeng, H., Wang, Q., Shiu, S.C.K., Zhang, L.: High-order local pooling and encoding Gaussians over a dictionary of Gaussians. IEEE Trans. Image Process. 26(7), 3372–3384 (2017)
Article MathSciNet MATH Google Scholar
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: CVPRW, pp. 9–14 (2010)
Lin, Y.C., Hu, M.C., Cheng, W.H., Hsieh, Y.H., Chen, H.M.: Human action recognition and retrieval using sole depth information. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 1053–1056 (2012)
Liu, A., Nie, W., Su, Y., Ma, L., Hao, T., Yang, Z.: Coupled hidden conditional random fields for RGB-D human action recognition. Signal Process. 112(C), 74–82 (2015)
Article Google Scholar
Liu, C., Hu, Y., Li, Y., Song, S., Liu, J.: PKU-MMD: a large scale benchmark for continuous multi-modal human action understanding. CoRR (2017). arXiv:1703.07475
Liu, J., Wang, G., Hu, P., Duan, L.Y., Kot, A.C.: Global context-aware attention LSTM networks for 3D action recognition. In: CVPR, pp. 3671–3680 (2017)
Liu, L., Shao, L.: Learning discriminative representations from RGB-D video data. In: IJCAI, pp. 1493–1500 (2013)
Liu, M., Liu, H., Chen, C.: 3D action recognition using multi-scale energy-based global ternary image. IEEE Trans. Circuits Syst. Video Technol. 28(8), 1824–1838 (2018)
Article Google Scholar
Lovrić, M., Min-Oo, M., Ruh, E.A.: Multivariate normal distributions parametrized as a Riemannian symmetric space. J. Multivar. Anal. 74(1), 36–48 (2000)
Article MathSciNet MATH Google Scholar
Luo, C., Ma, C., Wang, C., Wang, Y.: Learning discriminative activated simplices for action recognition. In: AAAI, pp. 4211–4217 (2017)
Luo, J., Wang, W., Qi, H.: Group sparsity and geometry constrained dictionary learning for action recognition from depth maps. In: ICCV, pp. 1809–1816 (2013)
Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online dictionary learning for sparse coding. In: ICML, pp. 689–696 (2009)
Mairal, J., Bach, F., Ponce, J., Sapiro, G., Jenatton, R., Obozinski, G.: SPAMS: SPArse modeling software, v2.4 (2014). http://spams-devel.gforge.inria.fr/downloads.html
Matsukawa, T., Okabe, T., Suzuki, E., Sato, Y.: Hierarchical Gaussian descriptor for person re-identification. In: CVPR, pp. 1363–1372 (2016)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. TPAMI 27(10), 1615–1630 (2005)
Article Google Scholar
Müller, M., Röder, T., Clausen, M., Eberhardt, B., Krüger, B., Weber, A.: Documentation Mocap Database HDM05. Technical Report CG-2007-2, Universität Bonn (2007)
Nguyen, X., Mouaddib, A.I., Nguyen, T., Jeanpierre, L.: Action recognition in depth videos using hierarchical Gaussian descriptor. Multimedia Tools Appl. 77(16), 21617–21652 (2018)
Article Google Scholar
Ojala, T., Pietikainen, M., Harwood, D.: Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, vol. 1, pp. 582–585 (1994)
Oneata, D., Verbeek, J., Schmid, C.: Action and event recognition with Fisher vectors on a compact feature set. In: ICCV, pp. 1817–1824 (2013)
Oreifej, O., Liu, Z.: HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In: CVPR, pp. 716–723 (2013)
Pang, Y., Yuan, Y., Li, X.: Gabor-based region covariance matrices for face recognition. IEEE Trans. Circuits Syst. Video Technol. 18(7), 989–993 (2008)
Article Google Scholar
Rahmani, H., Mian, A.: 3D action recognition from novel viewpoints. In: CVPR, pp. 1506–1515 (2016)
Sanchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the Fisher vector: theory and practice. IJCV 105(3), 222–245 (2013)
Article MathSciNet MATH Google Scholar
Seidenari, L., Varano, V., Berretti, S., Del Bimbo, A., Pala, P.: Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses. In: CVPRW, pp. 479–485 (2013)
Serra, G., Grana, C., Manfredi, M., Cucchiara, R.: GOLD: Gaussians of local descriptors for image representation. CVIU 134, 22–32 (2015)
Google Scholar
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: CVPR, pp. 1010–1019 (2016)
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Adaptive spectral graph convolutional networks for skeleton-based action recognition. CoRR (2018). arXiv:1805.07694
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR, pp. 1297–1304 (2011)
Tang, S., Wang, X., Lv, X., Han, T.X., Keller, J., He, Z., Skubic, M., Lao, S.: Histogram of oriented normal vectors for object recognition with a depth sensor. In: ACCV, pp. 525–538 (2013)
Tuzel, O., Porikli, F., Meer, P.: Region covariance: a fast descriptor for detection and classification. ECCV, Part II, pp. 589–600 (2006)
Tuzel, O., Porikli, F., Meer, P.: Pedestrian detection via classification on Riemannian manifolds. TPAMI 30(10), 1713–1727 (2008)
Article Google Scholar
Vedaldi, A., Fulkerson, B.: Vlfeat: an open and portable library of computer vision algorithms. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 1469–1472 (2010)
Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a Lie group. In: CVPR, pp. 588–595 (2014)
Wang, C., Flynn, J., Wang, Y., Yuille, A.L.: Recognizing actions in 3D using action-snippets and activated simplices. In: AAAI, pp. 3604–3610 (2016)
Wang, C., Wang, Y., Yuille, A.L.: An approach to pose-based action recognition. In: CVPR, pp. 915–922 (2013)
Wang, C., Wang, Y., Yuille, A.L.: Mining 3D key-pose-motifs for action recognition. In: CVPR, pp. 2639–2647 (2016)
Wang, J., Liu, Z., Chorowski, J., Chen, Z., Wu, Y.: Robust 3D action recognition with random occupancy patterns. In: ECCV, pp. 872–885 (2012)
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: CVPR, pp. 1290–1297 (2012)
Wang, L., Zhang, J., Zhou, L., Tang, C., Li, W.: Beyond covariance: feature representation with nonlinear kernel matrices. In: ICCV, pp. 4570–4578 (2015)
Wang, P., Li, W., Gao, Z., Zhang, J., Tang, C., Ogunbona, P.O.: Action recognition from depth maps using deep convolutional neural networks. IEEE Trans. Hum. Mach. Syst. 46(4), 498–509 (2016)
Wang, Q., Li, P., Zhang, L., Zuo, W.: Towards effective codebookless model for image classification. Pattern Recognit. 59(C), 63–71 (2016)
Article Google Scholar
Wright, J., Ma, Y., Mairal, J., Sapiro, G., Huang, T.S., Yan, S.: Sparse representation for computer vision and pattern recognition. Proc. IEEE 98(6), 1031–1044 (2010)
Article Google Scholar
Xia, L., Aggarwal, J.K.: Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: CVPR, pp. 2834–2841 (2013)
Xia, L., Chen, C.C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints. In: CVPRW, pp. 20–27 (2012)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI (2018)
Yang, X., Tian, Y.: Super normal vector for activity recognition using depth sequences. In: CVPR, pp. 804–811 (2014)
Yang, X., Tian, Y.L.: EigenJoints-based action recognition using Naive–Bayes-nearest-neighbor. In: CVPRW, pp. 14–19 (2012)
Yang, X., Zhang, C., Tian, Y.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 1057–1060 (2012)
Yi, Y., Wang, H.: Motion keypoint trajectory and covariance descriptor for human action recognition. Vis. Comput. 34(3), 391–403 (2018)
Article Google Scholar
Yu, M., Liu, L., Shao, L.: Structure-preserving binary representations for RGB-D action recognition. TPAMI 38(8), 1651–1664 (2016)
Article Google Scholar
Yuan, C., Hu, W., Li, X., Maybank, S., Luo, G.: Human action recognition under log-Euclidean Riemannian metric. In: ACCV, pp. 343–353 (2010)
Zanfir, M., Leordeanu, M., Sminchisescu, C.: The moving pose: an efficient 3D kinematics descriptor for low-latency action recognition and detection. In: ICCV, pp. 2752–2759 (2013)
Zhang, C., Tian, Y.: Histogram of 3D facets. CVIU 139(C), 29–39 (2015)
Google Scholar
Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. TPAMI 29(6), 915–928 (2007)
Article Google Scholar
Zhou, X., Yu, K., Zhang, T., Huang, T.S.: Image classification using super-vector coding of local image descriptors. In: ECCV, pp. 141–154 (2010)

Download references

Acknowledgements

Portions of the research in this paper use the DHA video dataset collected by Research Center for Information Technology Innovation (CITI), Academia Sinica.

Author information

Authors and Affiliations

CNRS, GREYC, UMR 6072, Université de Caen Basse-Normandie, 14000, Caen, France
Xuan Son Nguyen & Abdel-Illah Mouaddib
CNRS, ENSAM, LSIS, UMR 7296, Aix Marseille Université, 13397, Marseille, France
Thanh Phuong Nguyen
CNRS, LSIS, UMR 7296, Université de Toulon, 83957, La Garde, France
Thanh Phuong Nguyen

Authors

Xuan Son Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Abdel-Illah Mouaddib
View author publications
You can also search for this author in PubMed Google Scholar
Thanh Phuong Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuan Son Nguyen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, X.S., Mouaddib, AI. & Nguyen, T.P. Hierarchical Gaussian descriptor based on local pooling for action recognition. Machine Vision and Applications 30, 321–343 (2019). https://doi.org/10.1007/s00138-018-0989-9

Download citation

Received: 16 August 2017
Revised: 01 September 2018
Accepted: 01 November 2018
Published: 12 November 2018
Issue Date: 04 March 2019
DOI: https://doi.org/10.1007/s00138-018-0989-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical Gaussian descriptor based on local pooling for action recognition

Abstract

Access this article

Similar content being viewed by others

Human Action Recognition by Random Features and Hand-Crafted Features: A Comparative Study

Action recognition in depth videos using hierarchical gaussian descriptor

Action Recognition Using Super Sparse Coding Vector with Spatio-temporal Awareness

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hierarchical Gaussian descriptor based on local pooling for action recognition

Abstract

Access this article

Similar content being viewed by others

Human Action Recognition by Random Features and Hand-Crafted Features: A Comparative Study

Action recognition in depth videos using hierarchical gaussian descriptor

Action Recognition Using Super Sparse Coding Vector with Spatio-temporal Awareness

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation