Abstract
In this paper, a human action recognition method based on the kernelized Grassmann manifold learning is introduced. The goal is to find a map which transfers the high-dimensional data to a discriminative low-dimensional space by considering the geometry of the manifold. To this end, a multi-graph embedding method using three graphs named as center-class, within-class and between-class similarity graphs is proposed. These graphs capture the local and semi-global information of data which is the benefit of the proposed method. Graphs play an important role in subspace learning methods. Most of the graph-based methods ignore the geometry of the manifold-valued data because of using Euclidean distance in graph construction. So, these methods are sensitive to noise and outliers. To handle these problems, the geodesic distance is used to build the neighborhood graphs. We analyze the performance of the proposed method on both noisy (complex) and less noisy (simple) datasets. So, two geodesic distance metrics are used to calculate the geodesic distance of these datasets. Experimental results show the performance of the proposed method.
Similar content being viewed by others
References
Wu, D., et al.: Recent advances in video-based human action recognition using deep learning: a review. In: International Joint Conference on Neural Networks (IJCNN), pp. 2865–2872, May 2017
Hou, R., et al.: Tube convolutional neural network (T-CNN) for action detection in videos. In: IEEE International Conference on Computer Vision, pp. 5822–5831 (2017)
Li, C., et al.: Deep spatio-temporal manifold network for action recognition. arXiv preprint arXiv:1705.03148, pp. 1–12, May 2017
Weinzaepfel, P., et al.: Learning to track for spatio-temporal action localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3164–3172 (2015)
Wang, Y., et al.: Two-stream SR-CNNs for action recognition in videos. In: Proceedings of British Machine Vision Conference (BMVC), pp. 108.1–108.12, Sept 2016
Wang, P., et al.: Action recognition based on joint trajectory maps with convolutional neural networks. arXiv preprint arXiv:1612.09401, pp. 1–11, Dec 2016
Hou, Y., et al.: Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Trans. Circuits Syst. Video Technol. 28(3), 807–811 (2016)
Wang, G. et al.: DeepIGeoS: a deep interactive geodesic framework for medical image segmentation. arXiv preprint arXiv:1707.00652, pp. 1–14 (2017)
Weinland, D., et al.: A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Underst. 115(2), 224–241 (2011)
Yi, Y., et al.: Realistic action recognition with salient foreground trajectories. Expert Syst. Appl. 75, 44–55 (2017)
Xu, H., et al.: A joint evaluation of different dimensionality reduction techniques, fusion and learning methods for action recognition. Neurocomputing 214, 329–339 (2016)
Megrhi, S., et al.: Spatio-temporal action localization and detection for human action recognition in big dataset. J. Vis. Commun. Image Represent. 41, 375–390 (2016)
Qiao, R., et al.: Learning discriminative trajectorylet detector sets for accurate skeleton-based action recognition. Pattern Recogn. 66, 202–212 (2017)
Bagheri, M.A., et al.: Locality regularized group sparse coding for action recognition. Comput. Vis. Image Underst. 158, 106–114 (2017)
Devanne, M., et al.: 3-d human action recognition by shape analysis of motion trajectories on riemannian manifold. IEEE Trans. Cybern. 45(7), 1340–1352 (2015)
Zhang, B., et al.: Action recognition using 3D histograms of texture and a multi-class boosting classifier. IEEE Trans. Image Process. 26(10), 4648–4660 (2017)
Chen, C., et al.: Multi-temporal depth motion maps-based local binary patterns for 3-D human action recognition. IEEE Access 5, 22590–22604 (2017)
Liu, M., et al.: Robust 3D action recognition through sampling local appearances and global distributions. IEEE Trans. Multimed. 20(8), 1932–1947 (2017)
Shao, L., et al.: Spatio-temporal Laplacian pyramid coding for action recognition. IEEE Trans. Cybern. 44(6), 817–827 (2014)
Baumann, F., et al.: Recognizing human actions using novel space-time volume binary patterns. Neurocomputing 173, 54–63 (2016)
Van Der Maaten, L., et al.: Dimensionality reduction: a comparative review. Tilburg University Technical Report, TiCC-TR 2009-005 (2009)
He, X., Niyogi, P.: Locality preserving projections. In: Advances in Neural Information Processing Systems, pp. 153–160 (2004)
Wang, J.: Locally linear embedding. In: Wang, J. (ed.) Geometric Structure of High-Dimensional Data and Dimensionality Reduction, pp. 203–220. Springer, Berlin, Heidelberg (2011)
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)
Blackburn, J., Ribeiro, E.: Human motion recognition using isomap and dynamic time warping. In: Elgammal, A., Rosenhahn, B., Klette, R. (eds.) Human Motion–Understanding, Modeling, Capture and Animation. Lecture Notes in Computer Science, vol. 4814, pp. 285–298. Springer, Berlin (2007)
Christopher, B.: Pattern Recognition and Machine Learning. Springer, New York (2006)
Sakarya, U.: Dimension reduction using global and local pattern information-based maximum margin criterion. Signal Image Video Process. 10(5), 903–909 (2016)
Harandi, M.T., et al.: Kernel analysis on Grassmann manifolds for action recognition. Pattern Recogn. Lett. 34(15), 1906–1915 (2013)
Slama, R., et al.: Accurate 3D action recognition using learning on the Grassmann manifold. Pattern Recogn. 48(2), 556–567 (2015)
Harandi, M., et al.: Extrinsic methods for coding and dictionary learning on Grassmann manifolds. Int. J. Comput. Vis. 114(2–3), 113–136 (2015)
Nokleby, M., et al.: Discrimination on the Grassmann manifold: fundamental limits of subspace classifiers. IEEE Trans. Inf. Theory 61(4), 2133–2147 (2015)
Zhang, L., et al.: Grassmann multimodal implicit feature selection. Multimed. Syst. 20(6), 659–674 (2014)
Liu, A.A., et al.: Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(1), 102–114 (2017)
Shen, W., et al.: Exemplar-based human action pose correction and tagging. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1784–1791 (2012)
Shen, W., et al.: Exemplar-based human action pose correction. IEEE Trans. Cybern. 44(7), 1053–1066 (2014)
Escorcia, V., et al.: Guess Where? Actor-supervision for spatiotemporal action localization. arXiv preprint arXiv:1804.01824, pp. 1–10, April 2018
Chen, K., Forbus, K.D.: Action recognition from skeleton data via analogical generalization. In: Proceedings of 30th International Workshop on Qualitative Reasoning (2017)
Rahimi, S., et al.: Human action recognition by Grassmann manifold learning. In: 2015 9th Iranian Conference on Machine Vision and Image Processing (MVIP), pp. 61–64, Nov 2015
Turaga, P., et al.: Statistical computations on Grassmann and Stiefel manifolds for image and video-based recognition. IEEE Trans. Pattern Anal. Mach. Intell. 33(11), 2273–2286 (2011)
Wei, Z., et al.: An effective two-dimensional linear discriminant analysis with locality preserving approach for image recognition. SIViP 11(8), 1577–1584 (2017)
Aeini, F., et al.: Supervised hierarchical neighborhood graph construction for manifold learning. Signal Image Video Process. 12(4), 799–807 (2018)
Huang, X., et al.: Local discriminant canonical correlation analysis for supervised PolSAR image classification. IEEE Geosci. Remote Sens. Lett. 14(11), 2102–2106 (2017)
Azary, S.: Grassmann learning for recognition and classification. Dissertation, Rochester Institute of Technology (2014)
Edelman, A., et al.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)
Ly, N.H., et al.: Sparse graph-based discriminant analysis for hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 52(7), 3872–3884 (2014)
Hamm, J., Lee, D.D.: Grassmann discriminant analysis: a unifying view on subspace-based learning. Presented at the Proceedings of the 25th International Conference on Machine Learning, Helsinki, pp. 376–383, July 2008
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
KTH dataset: http://www.nada.kth.se/cvap/actions/. Accessed 6 June 2018
UCF sport dataset: http://crcv.ucf.edu/data/UCF_Sports_Action.php. Accessed 6 June 2018
MSR action 3D dataset: https://www.uow.edu.au/~wanqing/#Datasets. Accessed 6 June 2018
UTD-MHAD dataset: https://www.utdallas.edu/~kehtar/UTD-MHAD.html. Accessed 6 June 2018
UCF101 dataset: http://crcv.ucf.edu/data/UCF101.php. Accessed 6 June 2018
Chen, C., et al.: Utd-mhad: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: IEEE International Conference on Image Processing (ICIP), pp. 168–172 (2015)
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Rahimi, S., Aghagolzadeh, A. & Ezoji, M. Human action recognition based on the Grassmann multi-graph embedding. SIViP 13, 271–279 (2019). https://doi.org/10.1007/s11760-018-1354-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-018-1354-1