Rotation-based spatial–temporal feature learning from skeleton sequences for action recognition

Liu, Xing; Li, Yanshan; Xia, Rongjie

doi:10.1007/s11760-020-01644-0

Rotation-based spatial–temporal feature learning from skeleton sequences for action recognition

Original Paper
Published: 02 March 2020

Volume 14, pages 1227–1234, (2020)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

530 Accesses
22 Citations
Explore all metrics

Abstract

Recently, skeleton-based action recognition approaches achieve great improvement by providing various features from skeleton sequences as the inputs of deep neural networks for classification. Compared with other works which directly utilize the locations of joints as the representation of each action for recognition, our paper describes the relative relations embedded in skeletons to alleviate the impact of viewpoint diversity. Specifically, we propose a novel feature descriptor based on the rotation relations in skeletons to represent a certain action. Geometric algebra (GA) is introduced to calculate and derive the rotation relations according to the particular operator, namely the rotor in GA. In order to exploit the spatial and temporal characteristics in one skeleton sequence, we design two different rotor-based feature descriptors, respectively, from one frame of skeleton and two skeletons of consecutive frames. Then an efficient feature encoding strategy is proposed to transform each kind of feature descriptor into a RGB image. Afterward, we propose a two-stream convolutional neural network(CNN) based framework to learn the RGB images generated by each skeleton sequence and then fuse the scores of two networks to provide final recognition accuracy. Extensive experimental results on NTU RGB+D, Northwestern-UCLA, Gaming 3D, SYSU and UTD-MHAD datasets have demonstrated the superiority of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two-Stream Temporal Convolutional Networks for Skeleton-Based Human Action Recognition

Article 29 May 2020

Jin-Gong Jia, Yuan-Feng Zhou, … Cai-Ming Zhang

A Novel Multi-feature Skeleton Representation for 3D Action Recognition

Leveraging Pre-trained CNN Models for Skeleton-Based Action Recognition

References

Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3d skeletons as points in a lie group. In: CVPR, pp. 588–595 (2014)
Huang, Z., Wan, C., Probst, T.: Deep learning on lie groups for skeleton-based action recognition. In: CVPR, pp. 6099–6108 (2017)
Ke, Q., Bennamoun, M., An, S.: Learning clip representations for skeleton-based 3d action recognition. IEEE Trans. Image Process. 27(4), 2842–2855 (2018)
Article MathSciNet MATH Google Scholar
Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit. 68, 346–362 (2017)
Article Google Scholar
Tang, Y., Tian, Y., Lu, J.: Deep progressive reinforcement learning for skeleton-based action recognition. In: CVPR, pp. 5323–5332 (2018)
Jonas G., Michael A., David, G.: Convolutional sequence to sequence learning. In: ICML, pp. 1243–1252 (2017)
Dorst, L., Mann, S.: Geometric algebra: a computational framework for geometrical applications (part 1). Comput. Gr. Appl. 22(4), 58–67 (2002)
Article Google Scholar
Zhang, S., Liu, X., Xiao, J.: On geometric features for skeleton-based action recognition using multilayer LSTM networks. In: WACV, pp. 148–157 (2017)
Zhang, S., Yang, Y., Xiao, J.: Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks. IEEE Trans. Multimed. 20(9), 2330–2343 (2018)
Article Google Scholar
Ofli, F., Chaudhry, R., Kurillo, G.: Sequence of the most informative joints (smij): a new representation for human skeletal action recognition. J. Vis. Commun. Image Represent. 25(1), 24–38 (2014)
Article Google Scholar
Li, Y., Xia, R., Liu, X.: Learning shape-motion representations from geometric algebra spatio-temporal model for skeleton-based action recognition. In: ICME, pp. 1066–1071 (2019)
Hou, Y., Li, Z., Wang, P.: Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Trans. Circuits Syst. Video Technol. 28(3), 807–811 (2018)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
Shahroudy A., Liu J., Ng T.-T.: Ntu rgb\(+\)d: a large scale dataset for 3d human activity analysis. In: CVPR, pp. 1010–1019 (2016)
Hu J.F., Zheng, W., Lai, J.: Jointly learning heterogeneous features for RGB-D activity recognition. In: CVPR, pp. 5344–5352 (2015)
Li, C., Hou, Y., Wang, P.: Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process. Lett. 24(5), 624–628 (2017)
Article Google Scholar
Wang, P., Li, Z., Hou, Y.: Action recognition based on joint trajectory maps using convolutional neural networks. Knowl. Based Syst. 158(15), 43–53 (2018)
Article Google Scholar
Liu, J., Shahroudy, A., Xu, D.: Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 3007–3021 (2018)
Article Google Scholar
Jun, L., Gang, W., Duan, L.: Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans. Image Process. 27(4), 1586–1599 (2018)
Article MathSciNet MATH Google Scholar
Li, C., Cui, Z., Zheng, W.: Spatio-temporal graph convolution for skeleton based action recognition. In: AIAA, pp. 3482–3489 (2018)
Li, C., Hou, Y., Wang, P.: Multiview-based 3-d action recognition using deep networks. IEEE Trans. Hum. Mach. Syst. 49(1), 95–104 (2019)
Article Google Scholar
Wang, J., Nie, X., Xia, Y.: Cross-view action modeling, learning and recognition. In: CVPR, pp. 2649–2656 (2014)
Wang, H., Wang, L.: Learning content and style: joint action recognition and person identification from human skeletons. Pattern Recognit. 81, 23–35 (2018)
Article Google Scholar
Lee, I., Kim, D., Kang, S.: Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In: ICCV, pp. 1012–1020 (2017)
Ren, J., Reyes, N., Barczak, A.: Toward three dimensional human action recognition using a convolutional neural network with correctness-vigilant regularizer. J. Electron. Imaging 27(4), 043040 (2018)
Article Google Scholar
Victoria, B., Dimitrios, M., Vasileios, A.: G3d: a gaming action dataset and real time action recognition evaluation framework. In: CVPR Workshops, pp. 7–12 (2012)
Zhou, L., Li, W., Zhang, Y.: Discriminative key pose extraction using extended LC-KSVD for action recognition. In: DICTA, pp. 1–8 (2014)
Nie, S., Wang, Z., Ji, Q.: A generative restricted boltzmann machine based method for high-dimensional motion data modeling. Comput. Vis. Image Underst. 136, 14–22 (2015)
Article Google Scholar
Li, B., He, M., Dai, Y.: 3d skeleton based action recognition by video-domain translation-scale invariant mapping and multi-scale dilated cnn. Multimed. Tools Appl. 1, 1–21 (2018)
Google Scholar
Zhang, P., Lan, C., Xing, J.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: ICCV, pp. 2136–2145 (2017)
Xiang G., Wei, H., Jiaxiang, T.: Generalized graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:1811.12013 (2018)
Chen, S., Ya, J., Wei, W.: Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: ECCV, pp. 106–121 (2018)
Chen, C., Jafari, R., Kehtarnavaz, N.: UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: Proceedings of the IEEE International Conference on Image Processing, pp. 168–172 (2015)
Hussein, M.E., Torki, M., Gowayyed, M.A.: Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations. In IJCAI, pp. 2466–2472 (2013)
Ran, C., Gang, H., Aichun, Z.: Hard sample mining and learning for skeleton-based human action recognition and identification. IEEE Access 7, 8245–8257 (2019)
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank all reviewers’ constructive advice for improving this work. This work was partially supported by National Natural Science Foundation of China (Nos. 61771319 and 61871154), Natural Science Foundation of Guangdong Province (Nos. 2017A030313343 and 2019A1515011307), Shenzhen Science and Technology Project (No. JCYJ20180507182259896).

Author information

Authors and Affiliations

ATR National Key Laboratory of Defense Technology, Shenzhen University, Shenzhen, China
Xing Liu, Yanshan Li & Rongjie Xia

Authors

Xing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yanshan Li
View author publications
You can also search for this author in PubMed Google Scholar
Rongjie Xia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanshan Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, X., Li, Y. & Xia, R. Rotation-based spatial–temporal feature learning from skeleton sequences for action recognition. SIViP 14, 1227–1234 (2020). https://doi.org/10.1007/s11760-020-01644-0

Download citation

Received: 03 September 2019
Revised: 09 December 2019
Accepted: 21 January 2020
Published: 02 March 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11760-020-01644-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rotation-based spatial–temporal feature learning from skeleton sequences for action recognition

Abstract

Access this article

Similar content being viewed by others

Two-Stream Temporal Convolutional Networks for Skeleton-Based Human Action Recognition

A Novel Multi-feature Skeleton Representation for 3D Action Recognition

Leveraging Pre-trained CNN Models for Skeleton-Based Action Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Rotation-based spatial–temporal feature learning from skeleton sequences for action recognition

Abstract

Access this article

Similar content being viewed by others

Two-Stream Temporal Convolutional Networks for Skeleton-Based Human Action Recognition

A Novel Multi-feature Skeleton Representation for 3D Action Recognition

Leveraging Pre-trained CNN Models for Skeleton-Based Action Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation