Skip to main content
Log in

Rotation-based spatial–temporal feature learning from skeleton sequences for action recognition

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Recently, skeleton-based action recognition approaches achieve great improvement by providing various features from skeleton sequences as the inputs of deep neural networks for classification. Compared with other works which directly utilize the locations of joints as the representation of each action for recognition, our paper describes the relative relations embedded in skeletons to alleviate the impact of viewpoint diversity. Specifically, we propose a novel feature descriptor based on the rotation relations in skeletons to represent a certain action. Geometric algebra (GA) is introduced to calculate and derive the rotation relations according to the particular operator, namely the rotor in GA. In order to exploit the spatial and temporal characteristics in one skeleton sequence, we design two different rotor-based feature descriptors, respectively, from one frame of skeleton and two skeletons of consecutive frames. Then an efficient feature encoding strategy is proposed to transform each kind of feature descriptor into a RGB image. Afterward, we propose a two-stream convolutional neural network(CNN) based framework to learn the RGB images generated by each skeleton sequence and then fuse the scores of two networks to provide final recognition accuracy. Extensive experimental results on NTU RGB+D, Northwestern-UCLA, Gaming 3D, SYSU and UTD-MHAD datasets have demonstrated the superiority of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3d skeletons as points in a lie group. In: CVPR, pp. 588–595 (2014)

  2. Huang, Z., Wan, C., Probst, T.: Deep learning on lie groups for skeleton-based action recognition. In: CVPR, pp. 6099–6108 (2017)

  3. Ke, Q., Bennamoun, M., An, S.: Learning clip representations for skeleton-based 3d action recognition. IEEE Trans. Image Process. 27(4), 2842–2855 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  4. Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit. 68, 346–362 (2017)

    Article  Google Scholar 

  5. Tang, Y., Tian, Y., Lu, J.: Deep progressive reinforcement learning for skeleton-based action recognition. In: CVPR, pp. 5323–5332 (2018)

  6. Jonas G., Michael A., David, G.: Convolutional sequence to sequence learning. In: ICML, pp. 1243–1252 (2017)

  7. Dorst, L., Mann, S.: Geometric algebra: a computational framework for geometrical applications (part 1). Comput. Gr. Appl. 22(4), 58–67 (2002)

    Article  Google Scholar 

  8. Zhang, S., Liu, X., Xiao, J.: On geometric features for skeleton-based action recognition using multilayer LSTM networks. In: WACV, pp. 148–157 (2017)

  9. Zhang, S., Yang, Y., Xiao, J.: Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks. IEEE Trans. Multimed. 20(9), 2330–2343 (2018)

    Article  Google Scholar 

  10. Ofli, F., Chaudhry, R., Kurillo, G.: Sequence of the most informative joints (smij): a new representation for human skeletal action recognition. J. Vis. Commun. Image Represent. 25(1), 24–38 (2014)

    Article  Google Scholar 

  11. Li, Y., Xia, R., Liu, X.: Learning shape-motion representations from geometric algebra spatio-temporal model for skeleton-based action recognition. In: ICME, pp. 1066–1071 (2019)

  12. Hou, Y., Li, Z., Wang, P.: Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Trans. Circuits Syst. Video Technol. 28(3), 807–811 (2018)

    Article  Google Scholar 

  13. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)

    Article  Google Scholar 

  14. Shahroudy A., Liu J., Ng T.-T.: Ntu rgb\(+\)d: a large scale dataset for 3d human activity analysis. In: CVPR, pp. 1010–1019 (2016)

  15. Hu J.F., Zheng, W., Lai, J.: Jointly learning heterogeneous features for RGB-D activity recognition. In: CVPR, pp. 5344–5352 (2015)

  16. Li, C., Hou, Y., Wang, P.: Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process. Lett. 24(5), 624–628 (2017)

    Article  Google Scholar 

  17. Wang, P., Li, Z., Hou, Y.: Action recognition based on joint trajectory maps using convolutional neural networks. Knowl. Based Syst. 158(15), 43–53 (2018)

    Article  Google Scholar 

  18. Liu, J., Shahroudy, A., Xu, D.: Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 3007–3021 (2018)

    Article  Google Scholar 

  19. Jun, L., Gang, W., Duan, L.: Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans. Image Process. 27(4), 1586–1599 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  20. Li, C., Cui, Z., Zheng, W.: Spatio-temporal graph convolution for skeleton based action recognition. In: AIAA, pp. 3482–3489 (2018)

  21. Li, C., Hou, Y., Wang, P.: Multiview-based 3-d action recognition using deep networks. IEEE Trans. Hum. Mach. Syst. 49(1), 95–104 (2019)

    Article  Google Scholar 

  22. Wang, J., Nie, X., Xia, Y.: Cross-view action modeling, learning and recognition. In: CVPR, pp. 2649–2656 (2014)

  23. Wang, H., Wang, L.: Learning content and style: joint action recognition and person identification from human skeletons. Pattern Recognit. 81, 23–35 (2018)

    Article  Google Scholar 

  24. Lee, I., Kim, D., Kang, S.: Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In: ICCV, pp. 1012–1020 (2017)

  25. Ren, J., Reyes, N., Barczak, A.: Toward three dimensional human action recognition using a convolutional neural network with correctness-vigilant regularizer. J. Electron. Imaging 27(4), 043040 (2018)

    Article  Google Scholar 

  26. Victoria, B., Dimitrios, M., Vasileios, A.: G3d: a gaming action dataset and real time action recognition evaluation framework. In: CVPR Workshops, pp. 7–12 (2012)

  27. Zhou, L., Li, W., Zhang, Y.: Discriminative key pose extraction using extended LC-KSVD for action recognition. In: DICTA, pp. 1–8 (2014)

  28. Nie, S., Wang, Z., Ji, Q.: A generative restricted boltzmann machine based method for high-dimensional motion data modeling. Comput. Vis. Image Underst. 136, 14–22 (2015)

    Article  Google Scholar 

  29. Li, B., He, M., Dai, Y.: 3d skeleton based action recognition by video-domain translation-scale invariant mapping and multi-scale dilated cnn. Multimed. Tools Appl. 1, 1–21 (2018)

    Google Scholar 

  30. Zhang, P., Lan, C., Xing, J.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: ICCV, pp. 2136–2145 (2017)

  31. Xiang G., Wei, H., Jiaxiang, T.: Generalized graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:1811.12013 (2018)

  32. Chen, S., Ya, J., Wei, W.: Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: ECCV, pp. 106–121 (2018)

  33. Chen, C., Jafari, R., Kehtarnavaz, N.: UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: Proceedings of the IEEE International Conference on Image Processing, pp. 168–172 (2015)

  34. Hussein, M.E., Torki, M., Gowayyed, M.A.: Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations. In IJCAI, pp. 2466–2472 (2013)

  35. Ran, C., Gang, H., Aichun, Z.: Hard sample mining and learning for skeleton-based human action recognition and identification. IEEE Access 7, 8245–8257 (2019)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank all reviewers’ constructive advice for improving this work. This work was partially supported by National Natural Science Foundation of China (Nos. 61771319 and 61871154), Natural Science Foundation of Guangdong Province (Nos. 2017A030313343 and 2019A1515011307), Shenzhen Science and Technology Project (No. JCYJ20180507182259896).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanshan Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Li, Y. & Xia, R. Rotation-based spatial–temporal feature learning from skeleton sequences for action recognition. SIViP 14, 1227–1234 (2020). https://doi.org/10.1007/s11760-020-01644-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-020-01644-0

Keywords

Navigation