Skip to main content
Log in

Skeleton-based action recognition by part-aware graph convolutional networks

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

This paper proposes an improved graph convolutional networks to deal with the skeleton-based action recognition. Inspired by splitting skeleton into several parts to feed deep networks, the part-aware convolutions is designed to replace common convolutions which is performed on all the neighboring joints. For scale invariance on multi-scale data, an Inception-like structure is introduced, which can concatenate feature maps from different convolution kernels. In contrast to methods based on LSTMs, the model presented is capable of extracting both temporal and spatial features from input data. Due to full use of spatial structure, the performance is enhanced greatly on various datasets. To evaluate the model, experiments were conducted on three benchmark skeleton-based datasets, including Berkeley MHAD, SBU Kinect Interaction, and NTU RGB-D datasets. The effectiveness and robustness of the model are demonstrated by comparing the experimental results of the proposed model with the state-of-the-art results. In addition, feature maps from different layers of a trained model are explored and the explanation of the part-aware convolutions is also provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Bloom, V., Makris, D., Argyriou, V.: G3d: a gaming action dataset and real time action recognition evaluation framework. In: Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, IEEE, pp. 7–12 (2012)

  2. Chen, C., Jafari, R., Kehtarnavaz, N.: Utd-mhad: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: Image Processing (ICIP), 2015 IEEE International Conference on, IEEE, pp. 168–172(2015)

  3. Dawn, D.D., Shaikh, S.H.: A comprehensive survey of human action recognition with spatio-temporal interest point (stip) detector. Vis. Comput. 32(3), 289–306 (2016)

    Article  Google Scholar 

  4. Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in Neural Information Processing Systems, pp. 3844–3852 (2016)

  5. Devanne, M., Wannous, H., Berretti, S., Pala, P., Daoudi, M., Del Bimbo, A.: 3-D human action recognition by shape analysis of motion trajectories on riemannian manifold. IEEE Trans. Cybern. 45(7), 1340–1352 (2015)

    Article  Google Scholar 

  6. Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1118 (2015)

  7. Hammond, D.K., Vandergheynst, P., Gribonval, R.: Wavelets on graphs via spectral graph theory. Appl. Comput. Harmonic Anal. 30(2), 129–150 (2011)

    Article  MathSciNet  Google Scholar 

  8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  9. Hu, J.F., Zheng, W.S., Lai, J., Zhang, J.: Jointly learning heterogeneous features for rgb-d activity recognition. In: Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, IEEE, pp. 5344–5352 (2015)

  10. Ji, Y., Ye, G., Cheng, H.: Interactive body part contrast mining for human interaction recognition. In: Multimedia and Expo Workshops (ICMEW), 2014 IEEE International Conference on, IEEE, pp. 1–6 (2014)

  11. Jiang, X., Zhong, F., Peng, Q., Qin, X.: Online robust action recognition based on a hierarchical model. Vis. Comput. 30(9), 1021–1033 (2014)

    Article  Google Scholar 

  12. Jones, J.P., Palmer, L.A.: An evaluation of the two-dimensional gabor filter model of simple receptive fields in cat striate cortex. J. Neurophysiol. 58(6), 1233–1258 (1987)

    Article  Google Scholar 

  13. Kapsouras, I., Nikolaidis, N.: Action recognition on motion capture data using a dynemes and forward differences representation. J. Vis. Commun. Image Represent. 25(6), 1432–1445 (2014)

    Article  Google Scholar 

  14. Ke, Q., An, S., Bennamoun, M., Sohel, F., Boussaid, F.: Skeletonnet: mining deep part features for 3-d action recognition. IEEE Signal Process. Lett. 24(6), 731–735 (2017a)

    Article  Google Scholar 

  15. Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: A new representation of skeleton sequences for 3d action recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 4570–4579 (2017b)

  16. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks (2016). arXiv preprint. arXiv:1609.02907

  17. Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.: Gated graph sequence neural networks (2015). arXiv preprint. arXiv:1511.05493

  18. Li, C., Wang, P., Wang, S., Hou, Y., Li, W.: Skeleton-based action recognition using LSTM and CNN (2017). arXiv preprint. arXiv:1707.02356

  19. Li, C., Cui, Z., Zheng, W., Xu, C., Yang, J.: Spatio-temporal graph convolution for skeleton based action recognition (2018). arXiv preprint. arXiv:1802.09834

  20. Lin, M., Chen, Q., Yan, S.: Network in network (2013). arXiv preprint. arXiv:1312.4400

  21. Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal lstm with trust gates for 3d human action recognition. In: European Conference on Computer Vision, Springer, pp. 816–833 (2016)

  22. Liu, J., Wang, G., Hu, P., Duan, L.Y., Kot, A.C.: Global context-aware attention lstm networks for 3d action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3671–3680 (2017)

  23. Liu, J., Wang, G., Duan, L.Y., Abdiyeva, K., Kot, A.C.: Skeleton-based human action recognition with global context-aware attention lstm networks. IEEE Trans. Image Process. 27(4), 1586–1599 (2018)

    Article  MathSciNet  Google Scholar 

  24. Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., Bajcsy, R.: Berkeley mhad: a comprehensive multimodal human action database. In: Applications of Computer Vision (WACV), 2013 IEEE Workshop on, IEEE, pp. 53–60 (2013)

  25. Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., Bajcsy, R.: Sequence of the most informative joints (smij): a new representation for human skeletal action recognition. J. Vis. Commun. Image Represent. 25(1), 24–38 (2014)

    Article  Google Scholar 

  26. Ohn-Bar, E., Trivedi, M.: Joint angles similarities and hog2 for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 465–470 (2013)

  27. Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: Ntu rgb+d: a large scale dataset for 3d human activity analysis. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

  28. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint. arXiv:1409.1556

  29. Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: AAAI, 1, p. 7 (2017)

  30. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

  31. Tang, Y., Tian, Y., Lu, J., Li, P., Zhou, J.: Deep progressive reinforcement learning for skeleton-based action recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5323–5332 (2018)

  32. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)

  33. van der Maaten, L., Hinton, G.E.: Visualizing data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    MATH  Google Scholar 

  34. Vantigodi, S., Babu, R.V.: Real-time human action recognition from motion capture data. In: Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2013 Fourth National Conference on, IEEE, pp. 1–4 (2013)

  35. Vantigodi, S., Radhakrishnan, V.B.: Action recognition from motion capture data using meta-cognitive rbf network classifier. In: Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), 2014 IEEE Ninth International Conference on, IEEE, pp. 1–6 (2014)

  36. Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 588–595 (2014)

  37. Wang, H., Wang, L.: Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In: e Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

  38. Wu, J., Hu, D., Chen, F.: Action recognition by hidden temporal models. Vis. Comput. 30(12), 1395–1404 (2014)

    Article  Google Scholar 

  39. Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition (2018). arXiv preprint. arXiv:1801.07455

  40. Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, IEEE, pp. 28–35 (2012)

  41. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, Springer, pp. 818–833 (2014)

  42. Zeiler, M.D., Taylor, G.W., Fergus, R.: Adaptive deconvolutional networks for mid and high level feature learning. In: Computer Vision (ICCV), 2011 IEEE International Conference on, IEEE, pp. 2018–2025 (2011)

  43. Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2136–2145 (2017)

  44. Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., Xie, X., et al.: Co-occurrence feature learning for skeleton based action recognition using regularized deep lstm networks. In: AAAI, 2, p. 8 (2016)

Download references

Acknowledgements

(Portions of) the research in this paper used the NTU RGB+D Action Recognition Dataset made available by the ROSE Lab at the Nanyang Technological University, Singapore.

Funding

This study was funded by the National Science Foundation of China [61603091, Multi-Dimensions Based Physical Activity Assessment for the Human Daily Life].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lingfei Mo.

Ethics declarations

Conflict of interest

The authors, Yang Qin, Lingfei Mo, Chenyang Li, and Jiayi Luo, declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qin, Y., Mo, L., Li, C. et al. Skeleton-based action recognition by part-aware graph convolutional networks. Vis Comput 36, 621–631 (2020). https://doi.org/10.1007/s00371-019-01644-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-019-01644-3

Keywords

Navigation