SVIM: A Skeleton-Based View-Invariant Method for Online Gesture Recognition

Zhao, Yang; Dong, Lanfang; Li, Guoxin; Tang, Yingchao; Zhang, Yuhang; Mao, Meng; Li, Guoming; Tan, Linxiang

doi:10.1007/978-3-031-46674-8_17

Yang Zhao¹⁵,
Lanfang Dong¹⁵,
Guoxin Li¹⁵,
Yingchao Tang¹⁵,
Yuhang Zhang¹⁵,
Meng Mao¹⁶,
Guoming Li¹⁶ &
…
Linxiang Tan¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14179))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

385 Accesses

Abstract

Online gesture recognition is a challenging task in practical application scenarios since the gesture is not always directly in front of the camera. In order to solve the challenges caused by multiple viewpoints of skeleton data, in this paper, we proposed a novel view-invariant method for online skeleton gesture recognition. The whole skeleton sequence data as a point set in our method and a PCA-based view-invariant data preprocessing algorithm is proposed and applied in this paper. We can transform similar skeleton data to relatively stable viewpoints by applying the PCA algorithm according to the similarity of distribution features of the point set, which can ensures the viewpoint stability of our gesture recognition model. We conduct extensive experiments on the NTU RGB+D and Northwestern-UCLA benchmark datasets which contain multiple viewpoints and the results have demonstrated the effectiveness of the method proposed in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, Y., Zhang, Z., Yuan, C., Li, B., Deng, Y., Hu, W.: Channel-wise topology refinement graph convolution for skeleton-based action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13359–13368 (2021)
Google Scholar
Cheng, K., Zhang, Y., Cao, C., Shi, L., Cheng, J., Lu, H.: Decoupling GCN with DropGraph module for skeleton-based action recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12369, pp. 536–553. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58586-0_32
Chapter Google Scholar
Cheng, K., Zhang, Y., He, X., Chen, W., Cheng, J., Lu, H.: Skeleton-based action recognition with shift graph convolutional network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 183–192 (2020)
Google Scholar
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1118 (2015)
Google Scholar
Ghorbel, E., Boutteau, R., Boonaert, J., Savatier, X., Lecoeuche, S.: Kinematic spline curves: a temporal invariant descriptor for fast action recognition. Image Vis. Comput. 77, 60–71 (2018)
Article Google Scholar
Ghorbel, E., et al.: A view-invariant framework for fast skeleton-based action recognition using a single RGB camera. In: Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, pp. 573–582 (2019)
Google Scholar
Ji, Y., Xu, F., Yang, Y., Xie, N., Shen, H.T., Harada, T.: Attention transfer (ant) network for view-invariant action recognition. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 574–582 (2019)
Google Scholar
Junejo, I.N., Dexter, E., Laptev, I., Pérez, P.: Cross-view action recognition from temporal self-similarities. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 293–306. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_22
Chapter Google Scholar
Korban, M., Li, X.: DDGCN: a dynamic directed graph convolutional network for action recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12365, pp. 761–776. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58565-5_45
Chapter Google Scholar
Lee, I., Kim, D., Kang, S., Lee, S.: Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1012–1020 (2017)
Google Scholar
Li, C., Zhong, Q., Xie, D., Pu, S.: Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation (2018)
Google Scholar
Li, F., Fujiwara, K., Okura, F., Matsushita, Y.: A closer look at rotation-invariant deep point cloud analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16218–16227 (2021)
Google Scholar
Li, S., Li, W., Cook, C., Zhu, C., Gao, Y.: Independently recurrent neural network (indrnn): building a longer and deeper RNN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5457–5466 (2018)
Google Scholar
Li, Y., Xia, R., Liu, X.: Learning shape and motion representations for view invariant skeleton-based action recognition. Pattern Recogn. 103, 107293 (2020)
Article Google Scholar
Papadakis, A., Mathe, E., Spyrou, E., Mylonas, P.: A geometric approach for cross-view human action recognition using deep learning. In: 2019 11th International Symposium on Image and Signal Processing and Analysis, pp. 258–263 (2019)
Google Scholar
Xiaomin, P., Fan Huijie, T.Y.: Action recognition method of spatio-temporal feature fusion deep learning network. Infrared Laser Eng. 47(2), 55–60 (2018)
Google Scholar
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010–1019 (2016)
Google Scholar
Shao, Z., Li, Y., Zhang, H.: Learning representations from skeletal self-similarities for cross-view action recognition. IEEE Trans. Circuits Syst. Video Technol. 31(1), 160–174 (2020)
Article Google Scholar
Si, C., Chen, W., Wang, W., Wang, L., Tan, T.: An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1227–1236 (2019)
Google Scholar
Veeriah, V., Zhuang, N., Qi, G.J.: Differential recurrent neural networks for action recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4041–4049 (2015)
Google Scholar
Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a lie group. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 588–595 (2014)
Google Scholar
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Learning actionlet ensemble for 3D human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 914–927 (2013)
Article Google Scholar
Wang, J., Nie, X., Xia, Y., Wu, Y., Zhu, S.C.: Cross-view action modeling, learning and recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2649–2656 (2014)
Google Scholar
Xia, L., Chen, C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–27 (2012)
Google Scholar
Yan, P., Khan, S.M., Shah, M.: Learning 4D action feature models for arbitrary view action recognition. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7 (2008)
Google Scholar
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, pp. 7444–7452 (2018)
Google Scholar
Yang, F., Wu, Y., Sakti, S., Nakamura, S.: Make skeleton-based action recognition model smaller, faster and better. In: Proceedings of the ACM Multimedia Asia, pp. 1–6 (2019)
Google Scholar
Zhang, P., Lan, C., Zeng, W., Xing, J., Xue, J., Zheng, N.: Semantics-guided neural networks for efficient skeleton-based human action recognition. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1112–1121 (2020)
Google Scholar
Zhang, X., Xu, C., Tao, D.: Context aware graph convolution for skeleton-based action recognition. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14321–14330 (2020)
Google Scholar
Yi, Z., Shuo, Z., Yuan, L.: View-invariant 3D hand trajectory-based recognition. J. Univ. Electr. Sci. Technol. China 43(1), 60–65 (2014)
Google Scholar

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China under Grant No. 2020YFB1313602, The authors thank the reviewers for their valuable suggestions.

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, 230026, China
Yang Zhao, Lanfang Dong, Guoxin Li, Yingchao Tang & Yuhang Zhang
AI Lab, China Merchants Bank, Shenzhen, 518040, China
Meng Mao, Guoming Li & Linxiang Tan

Authors

Yang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Lanfang Dong
View author publications
You can also search for this author in PubMed Google Scholar
Guoxin Li
View author publications
You can also search for this author in PubMed Google Scholar
Yingchao Tang
View author publications
You can also search for this author in PubMed Google Scholar
Yuhang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Meng Mao
View author publications
You can also search for this author in PubMed Google Scholar
Guoming Li
View author publications
You can also search for this author in PubMed Google Scholar
Linxiang Tan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lanfang Dong .

Editor information

Editors and Affiliations

Northeastern University, Shenyang, China
Xiaochun Yang
The University of Indonesia, Depok, Indonesia
Heru Suhartanto
Beijing Institute of Technology, Beijing, China
Guoren Wang
Northeastern University, Shenyang, China
Bin Wang
University of Technology Sydney, Sydney, NSW, Australia
Jing Jiang
Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
Bing Li
Sun Yat-sen University, Guangzhou, China
Huaijie Zhu
Anhui University, Hefei, China
Ningning Cui

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, Y. et al. (2023). SVIM: A Skeleton-Based View-Invariant Method for Online Gesture Recognition. In: Yang, X., et al. Advanced Data Mining and Applications. ADMA 2023. Lecture Notes in Computer Science(), vol 14179. Springer, Cham. https://doi.org/10.1007/978-3-031-46674-8_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-46674-8_17
Published: 05 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46673-1
Online ISBN: 978-3-031-46674-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SVIM: A Skeleton-Based View-Invariant Method for Online Gesture Recognition