Skeleton-based action recognition by part-aware graph convolutional networks

Qin, Yang; Mo, Lingfei; Li, Chenyang; Luo, Jiayi

doi:10.1007/s00371-019-01644-3

Skeleton-based action recognition by part-aware graph convolutional networks

Original Article
Published: 26 March 2019

Volume 36, pages 621–631, (2020)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Yang Qin¹,
Lingfei Mo ORCID: orcid.org/0000-0002-8561-9122¹,
Chenyang Li¹ &
…
Jiayi Luo¹

1386 Accesses
36 Citations
Explore all metrics

Abstract

This paper proposes an improved graph convolutional networks to deal with the skeleton-based action recognition. Inspired by splitting skeleton into several parts to feed deep networks, the part-aware convolutions is designed to replace common convolutions which is performed on all the neighboring joints. For scale invariance on multi-scale data, an Inception-like structure is introduced, which can concatenate feature maps from different convolution kernels. In contrast to methods based on LSTMs, the model presented is capable of extracting both temporal and spatial features from input data. Due to full use of spatial structure, the performance is enhanced greatly on various datasets. To evaluate the model, experiments were conducted on three benchmark skeleton-based datasets, including Berkeley MHAD, SBU Kinect Interaction, and NTU RGB-D datasets. The effectiveness and robustness of the model are demonstrated by comparing the experimental results of the proposed model with the state-of-the-art results. In addition, feature maps from different layers of a trained model are explored and the explanation of the part-aware convolutions is also provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Spatio-temporal Attention Graph Convolutions for Skeleton-based Action Recognition

HybridNet: Integrating GCN and CNN for skeleton-based action recognition

Article 20 April 2022

Enhanced decoupling graph convolution network for skeleton-based action recognition

Article 26 October 2023

References

Bloom, V., Makris, D., Argyriou, V.: G3d: a gaming action dataset and real time action recognition evaluation framework. In: Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, IEEE, pp. 7–12 (2012)
Chen, C., Jafari, R., Kehtarnavaz, N.: Utd-mhad: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: Image Processing (ICIP), 2015 IEEE International Conference on, IEEE, pp. 168–172(2015)
Dawn, D.D., Shaikh, S.H.: A comprehensive survey of human action recognition with spatio-temporal interest point (stip) detector. Vis. Comput. 32(3), 289–306 (2016)
Article Google Scholar
Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in Neural Information Processing Systems, pp. 3844–3852 (2016)
Devanne, M., Wannous, H., Berretti, S., Pala, P., Daoudi, M., Del Bimbo, A.: 3-D human action recognition by shape analysis of motion trajectories on riemannian manifold. IEEE Trans. Cybern. 45(7), 1340–1352 (2015)
Article Google Scholar
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1118 (2015)
Hammond, D.K., Vandergheynst, P., Gribonval, R.: Wavelets on graphs via spectral graph theory. Appl. Comput. Harmonic Anal. 30(2), 129–150 (2011)
Article MathSciNet Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hu, J.F., Zheng, W.S., Lai, J., Zhang, J.: Jointly learning heterogeneous features for rgb-d activity recognition. In: Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, IEEE, pp. 5344–5352 (2015)
Ji, Y., Ye, G., Cheng, H.: Interactive body part contrast mining for human interaction recognition. In: Multimedia and Expo Workshops (ICMEW), 2014 IEEE International Conference on, IEEE, pp. 1–6 (2014)
Jiang, X., Zhong, F., Peng, Q., Qin, X.: Online robust action recognition based on a hierarchical model. Vis. Comput. 30(9), 1021–1033 (2014)
Article Google Scholar
Jones, J.P., Palmer, L.A.: An evaluation of the two-dimensional gabor filter model of simple receptive fields in cat striate cortex. J. Neurophysiol. 58(6), 1233–1258 (1987)
Article Google Scholar
Kapsouras, I., Nikolaidis, N.: Action recognition on motion capture data using a dynemes and forward differences representation. J. Vis. Commun. Image Represent. 25(6), 1432–1445 (2014)
Article Google Scholar
Ke, Q., An, S., Bennamoun, M., Sohel, F., Boussaid, F.: Skeletonnet: mining deep part features for 3-d action recognition. IEEE Signal Process. Lett. 24(6), 731–735 (2017a)
Article Google Scholar
Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: A new representation of skeleton sequences for 3d action recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 4570–4579 (2017b)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks (2016). arXiv preprint. arXiv:1609.02907
Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.: Gated graph sequence neural networks (2015). arXiv preprint. arXiv:1511.05493
Li, C., Wang, P., Wang, S., Hou, Y., Li, W.: Skeleton-based action recognition using LSTM and CNN (2017). arXiv preprint. arXiv:1707.02356
Li, C., Cui, Z., Zheng, W., Xu, C., Yang, J.: Spatio-temporal graph convolution for skeleton based action recognition (2018). arXiv preprint. arXiv:1802.09834
Lin, M., Chen, Q., Yan, S.: Network in network (2013). arXiv preprint. arXiv:1312.4400
Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal lstm with trust gates for 3d human action recognition. In: European Conference on Computer Vision, Springer, pp. 816–833 (2016)
Liu, J., Wang, G., Hu, P., Duan, L.Y., Kot, A.C.: Global context-aware attention lstm networks for 3d action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3671–3680 (2017)
Liu, J., Wang, G., Duan, L.Y., Abdiyeva, K., Kot, A.C.: Skeleton-based human action recognition with global context-aware attention lstm networks. IEEE Trans. Image Process. 27(4), 1586–1599 (2018)
Article MathSciNet Google Scholar
Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., Bajcsy, R.: Berkeley mhad: a comprehensive multimodal human action database. In: Applications of Computer Vision (WACV), 2013 IEEE Workshop on, IEEE, pp. 53–60 (2013)
Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., Bajcsy, R.: Sequence of the most informative joints (smij): a new representation for human skeletal action recognition. J. Vis. Commun. Image Represent. 25(1), 24–38 (2014)
Article Google Scholar
Ohn-Bar, E., Trivedi, M.: Joint angles similarities and hog2 for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 465–470 (2013)
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: Ntu rgb+d: a large scale dataset for 3d human activity analysis. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint. arXiv:1409.1556
Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: AAAI, 1, p. 7 (2017)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Tang, Y., Tian, Y., Lu, J., Li, P., Zhou, J.: Deep progressive reinforcement learning for skeleton-based action recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5323–5332 (2018)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)
van der Maaten, L., Hinton, G.E.: Visualizing data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008)
MATH Google Scholar
Vantigodi, S., Babu, R.V.: Real-time human action recognition from motion capture data. In: Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2013 Fourth National Conference on, IEEE, pp. 1–4 (2013)
Vantigodi, S., Radhakrishnan, V.B.: Action recognition from motion capture data using meta-cognitive rbf network classifier. In: Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), 2014 IEEE Ninth International Conference on, IEEE, pp. 1–6 (2014)
Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 588–595 (2014)
Wang, H., Wang, L.: Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In: e Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Wu, J., Hu, D., Chen, F.: Action recognition by hidden temporal models. Vis. Comput. 30(12), 1395–1404 (2014)
Article Google Scholar
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition (2018). arXiv preprint. arXiv:1801.07455
Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, IEEE, pp. 28–35 (2012)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, Springer, pp. 818–833 (2014)
Zeiler, M.D., Taylor, G.W., Fergus, R.: Adaptive deconvolutional networks for mid and high level feature learning. In: Computer Vision (ICCV), 2011 IEEE International Conference on, IEEE, pp. 2018–2025 (2011)
Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2136–2145 (2017)
Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., Xie, X., et al.: Co-occurrence feature learning for skeleton based action recognition using regularized deep lstm networks. In: AAAI, 2, p. 8 (2016)

Download references

Acknowledgements

(Portions of) the research in this paper used the NTU RGB+D Action Recognition Dataset made available by the ROSE Lab at the Nanyang Technological University, Singapore.

Funding

This study was funded by the National Science Foundation of China [61603091, Multi-Dimensions Based Physical Activity Assessment for the Human Daily Life].

Author information

Authors and Affiliations

School of Instrument Science and Engineering, Southeast University, Nanjing, China
Yang Qin, Lingfei Mo, Chenyang Li & Jiayi Luo

Authors

Yang Qin
View author publications
You can also search for this author in PubMed Google Scholar
Lingfei Mo
View author publications
You can also search for this author in PubMed Google Scholar
Chenyang Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiayi Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lingfei Mo.

Ethics declarations

Conflict of interest

The authors, Yang Qin, Lingfei Mo, Chenyang Li, and Jiayi Luo, declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qin, Y., Mo, L., Li, C. et al. Skeleton-based action recognition by part-aware graph convolutional networks. Vis Comput 36, 621–631 (2020). https://doi.org/10.1007/s00371-019-01644-3

Download citation

Published: 26 March 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s00371-019-01644-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Skeleton-based action recognition by part-aware graph convolutional networks

Abstract

Access this article

Similar content being viewed by others

Spatio-temporal Attention Graph Convolutions for Skeleton-based Action Recognition

HybridNet: Integrating GCN and CNN for skeleton-based action recognition

Enhanced decoupling graph convolution network for skeleton-based action recognition

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Skeleton-based action recognition by part-aware graph convolutional networks

Abstract

Access this article

Similar content being viewed by others

Spatio-temporal Attention Graph Convolutions for Skeleton-based Action Recognition

HybridNet: Integrating GCN and CNN for skeleton-based action recognition

Enhanced decoupling graph convolution network for skeleton-based action recognition

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation