Abstract
Recently, skeleton-based action recognition approaches achieve great improvement by providing various features from skeleton sequences as the inputs of deep neural networks for classification. Compared with other works which directly utilize the locations of joints as the representation of each action for recognition, our paper describes the relative relations embedded in skeletons to alleviate the impact of viewpoint diversity. Specifically, we propose a novel feature descriptor based on the rotation relations in skeletons to represent a certain action. Geometric algebra (GA) is introduced to calculate and derive the rotation relations according to the particular operator, namely the rotor in GA. In order to exploit the spatial and temporal characteristics in one skeleton sequence, we design two different rotor-based feature descriptors, respectively, from one frame of skeleton and two skeletons of consecutive frames. Then an efficient feature encoding strategy is proposed to transform each kind of feature descriptor into a RGB image. Afterward, we propose a two-stream convolutional neural network(CNN) based framework to learn the RGB images generated by each skeleton sequence and then fuse the scores of two networks to provide final recognition accuracy. Extensive experimental results on NTU RGB+D, Northwestern-UCLA, Gaming 3D, SYSU and UTD-MHAD datasets have demonstrated the superiority of our method.
Similar content being viewed by others
References
Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3d skeletons as points in a lie group. In: CVPR, pp. 588–595 (2014)
Huang, Z., Wan, C., Probst, T.: Deep learning on lie groups for skeleton-based action recognition. In: CVPR, pp. 6099–6108 (2017)
Ke, Q., Bennamoun, M., An, S.: Learning clip representations for skeleton-based 3d action recognition. IEEE Trans. Image Process. 27(4), 2842–2855 (2018)
Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit. 68, 346–362 (2017)
Tang, Y., Tian, Y., Lu, J.: Deep progressive reinforcement learning for skeleton-based action recognition. In: CVPR, pp. 5323–5332 (2018)
Jonas G., Michael A., David, G.: Convolutional sequence to sequence learning. In: ICML, pp. 1243–1252 (2017)
Dorst, L., Mann, S.: Geometric algebra: a computational framework for geometrical applications (part 1). Comput. Gr. Appl. 22(4), 58–67 (2002)
Zhang, S., Liu, X., Xiao, J.: On geometric features for skeleton-based action recognition using multilayer LSTM networks. In: WACV, pp. 148–157 (2017)
Zhang, S., Yang, Y., Xiao, J.: Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks. IEEE Trans. Multimed. 20(9), 2330–2343 (2018)
Ofli, F., Chaudhry, R., Kurillo, G.: Sequence of the most informative joints (smij): a new representation for human skeletal action recognition. J. Vis. Commun. Image Represent. 25(1), 24–38 (2014)
Li, Y., Xia, R., Liu, X.: Learning shape-motion representations from geometric algebra spatio-temporal model for skeleton-based action recognition. In: ICME, pp. 1066–1071 (2019)
Hou, Y., Li, Z., Wang, P.: Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Trans. Circuits Syst. Video Technol. 28(3), 807–811 (2018)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Shahroudy A., Liu J., Ng T.-T.: Ntu rgb\(+\)d: a large scale dataset for 3d human activity analysis. In: CVPR, pp. 1010–1019 (2016)
Hu J.F., Zheng, W., Lai, J.: Jointly learning heterogeneous features for RGB-D activity recognition. In: CVPR, pp. 5344–5352 (2015)
Li, C., Hou, Y., Wang, P.: Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process. Lett. 24(5), 624–628 (2017)
Wang, P., Li, Z., Hou, Y.: Action recognition based on joint trajectory maps using convolutional neural networks. Knowl. Based Syst. 158(15), 43–53 (2018)
Liu, J., Shahroudy, A., Xu, D.: Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 3007–3021 (2018)
Jun, L., Gang, W., Duan, L.: Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans. Image Process. 27(4), 1586–1599 (2018)
Li, C., Cui, Z., Zheng, W.: Spatio-temporal graph convolution for skeleton based action recognition. In: AIAA, pp. 3482–3489 (2018)
Li, C., Hou, Y., Wang, P.: Multiview-based 3-d action recognition using deep networks. IEEE Trans. Hum. Mach. Syst. 49(1), 95–104 (2019)
Wang, J., Nie, X., Xia, Y.: Cross-view action modeling, learning and recognition. In: CVPR, pp. 2649–2656 (2014)
Wang, H., Wang, L.: Learning content and style: joint action recognition and person identification from human skeletons. Pattern Recognit. 81, 23–35 (2018)
Lee, I., Kim, D., Kang, S.: Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In: ICCV, pp. 1012–1020 (2017)
Ren, J., Reyes, N., Barczak, A.: Toward three dimensional human action recognition using a convolutional neural network with correctness-vigilant regularizer. J. Electron. Imaging 27(4), 043040 (2018)
Victoria, B., Dimitrios, M., Vasileios, A.: G3d: a gaming action dataset and real time action recognition evaluation framework. In: CVPR Workshops, pp. 7–12 (2012)
Zhou, L., Li, W., Zhang, Y.: Discriminative key pose extraction using extended LC-KSVD for action recognition. In: DICTA, pp. 1–8 (2014)
Nie, S., Wang, Z., Ji, Q.: A generative restricted boltzmann machine based method for high-dimensional motion data modeling. Comput. Vis. Image Underst. 136, 14–22 (2015)
Li, B., He, M., Dai, Y.: 3d skeleton based action recognition by video-domain translation-scale invariant mapping and multi-scale dilated cnn. Multimed. Tools Appl. 1, 1–21 (2018)
Zhang, P., Lan, C., Xing, J.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: ICCV, pp. 2136–2145 (2017)
Xiang G., Wei, H., Jiaxiang, T.: Generalized graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:1811.12013 (2018)
Chen, S., Ya, J., Wei, W.: Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: ECCV, pp. 106–121 (2018)
Chen, C., Jafari, R., Kehtarnavaz, N.: UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: Proceedings of the IEEE International Conference on Image Processing, pp. 168–172 (2015)
Hussein, M.E., Torki, M., Gowayyed, M.A.: Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations. In IJCAI, pp. 2466–2472 (2013)
Ran, C., Gang, H., Aichun, Z.: Hard sample mining and learning for skeleton-based human action recognition and identification. IEEE Access 7, 8245–8257 (2019)
Acknowledgements
The authors would like to thank all reviewers’ constructive advice for improving this work. This work was partially supported by National Natural Science Foundation of China (Nos. 61771319 and 61871154), Natural Science Foundation of Guangdong Province (Nos. 2017A030313343 and 2019A1515011307), Shenzhen Science and Technology Project (No. JCYJ20180507182259896).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, X., Li, Y. & Xia, R. Rotation-based spatial–temporal feature learning from skeleton sequences for action recognition. SIViP 14, 1227–1234 (2020). https://doi.org/10.1007/s11760-020-01644-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-020-01644-0