Decoupled Representation Network for Skeleton-Based Hand Gesture Recognition

Zhong, Zhaochao; Li, Yangke; Yang, Jifang

doi:10.1007/978-3-031-15931-2_39

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13530))

Included in the following conference series:

International Conference on Artificial Neural Networks

2236 Accesses

Abstract

Skeleton-based dynamic hand gesture recognition plays an increasing role in the human-computer interaction field. It is well known that different skeleton representations will have a greater impact on the recognition results, but most methods only use the original skeleton data as input, which hinders the improvement of accuracy to a certain extent. In this paper, we propose a novel decoupled representation network (DR-Net) for skeleton-based dynamic hand gesture recognition, which consists of temporal perception branch and spatial perception branch. For the former, it uses the temporal representation encoder to extract short-term motion features and long-term motion features, which can effectively reflect contextual information of skeleton sequences. Besides, we also design the temporal fusion module (TFM) to capture multi-scale temporal features. For the latter, we use the spatial representation encoder to extract spatial low-frequency features and spatial high-frequency features. Besides, we also design the spatial fusion module (SFM) to enhance important spatial features. Experimental results and ablation studies on two benchmark datasets demonstrate that our proposed DR-Net is competitive with the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Boulahia, S.Y., Anquetil, E., Multon, F., Kulpa, R.: Dynamic hand gesture recognition based on 3d pattern assembled trajectories. In: 2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA), pp. 1–6. IEEE
Google Scholar
Caetano, C., Sena, J., Brémond, F., Dos Santos, J.A., Schwartz, W.R.: Skelemotion: a new representation of skeleton joint sequences based on motion information for 3d action recognition. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–8. IEEE (2019)
Google Scholar
Cao, W.: Application of the support vector machine algorithm based gesture recognition in human-computer interaction. Informatica 43(1), 123–127 (2019)
Article Google Scholar
Chen, X., Wang, G., Guo, H., Zhang, C., Wang, H., Zhang, L.: Mfa-net: motion feature augmented network for dynamic hand gesture recognition from skeletal data. Sensors 19(2), 239 (2019)
Article Google Scholar
Chen, Y., Zhao, L., Peng, X., Yuan, J., Metaxas, D.N.: Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. In: 30th British Machine Vision Conference, pp. 103–116 (2019)
Google Scholar
Côté, S., Beaulieu, O.: VR road and construction site safety conceptual modeling based on hand gestures. Front. Robot. AI 6, 15 (2019)
Article Google Scholar
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1118 (2015)
Google Scholar
Garcia-Hernando, G., Kim, T.K.: Transition forests: learning discriminative temporal transitions for action recognition and detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 432–440 (2017)
Google Scholar
Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.K.: First-person hand action benchmark with RGB-d videos and 3d hand pose annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 409–419 (2018)
Google Scholar
Guo, F., He, Z., Zhang, S., Zhao, X., Fang, J., Tan, J.: Normalized edge convolutional networks for skeleton-based hand gesture recognition. Pattern Recogn. 118, 108044 (2021)
Article Google Scholar
Hou, J., Wang, G., Chen, X., Xue, J.-H., Zhu, R., Yang, H.: Spatial-temporal attention res-tcn for skeleton-based dynamic hand gesture recognition. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11134, pp. 273–286. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11024-6_18
Chapter Google Scholar
Hu, J.F., Zheng, W.S., Lai, J., Zhang, J.: Jointly learning heterogeneous features for RGB-d activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5344–5352 (2015)
Google Scholar
Huang, Z., Gool, L.V.: A riemannian network for SPD matrix learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 2036–2042 (2017)
Google Scholar
Huang, Z., Wu, J., Van Gool, L.: Building deep networks on grassmann manifolds. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), pp. 3279–3286. AAAI Press (2018)
Google Scholar
Jiang, X., Xu, K., Sun, T.: Action recognition scheme based on skeleton representation with ds-LSTM network. IEEE Trans. Circ. Syst. Video Technol. 30(7), 2129–2140 (2019)
Article Google Scholar
Lee, M., Lee, J., Chang, J.H.: Ensemble of jointly trained deep neural network-based acoustic models for reverberant speech recognition. Digital Sig. Process. 85, 1–9 (2019)
Article Google Scholar
Li, Y., Guo, T., Liu, X., Xia, R.: Skeleton-based action recognition with lie group and deep neural networks. In: 2019 IEEE 4th International Conference on Signal and Image Processing (ICSIP), pp. 26–30. IEEE (2019)
Google Scholar
Liu, H., Tu, J., Liu, M., Ding, R.: Learning explicit shape and motion evolution maps for skeleton-based human action recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1333–1337. IEEE (2018)
Google Scholar
Liu, J., Liu, Y., Wang, Y., Prinet, V., Xiang, S., Pan, C.: Decoupled representation learning for skeleton-based gesture recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5751–5760 (2020)
Google Scholar
Liu, J., Liu, N., Wang, P., Wang, M., Guo, S.: Array-less touch position identification based on a flexible capacitive tactile sensor for human-robot interactions. In: 2019 IEEE 4th International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 458–462. IEEE (2019)
Google Scholar
Maghoumi, M., LaViola, J.J.: DeepGRU: deep gesture recognition utility. In: Bebis, G., et al. (eds.) ISVC 2019. LNCS, vol. 11844, pp. 16–31. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33720-9_2
Chapter Google Scholar
Nasri, N., Orts-Escolano, S., Cazorla, M.: An semg-controlled 3d game for rehabilitation therapies: real-time time hand gesture recognition using deep learning techniques. Sensors 20(22), 6451 (2020)
Article Google Scholar
Nguyen, X.S., Brun, L., Lézoray, O., Bougleux, S.: A neural network based on SPD manifold learning for skeleton-based hand gesture recognition. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12028–12037. IEEE (2019)
Google Scholar
Nunez, J.C., Cabido, R., Pantrigo, J.J., Montemayor, A.S., Velez, J.F.: Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recogn. 76, 80–94 (2018)
Article Google Scholar
Rastgoo, R., Kiani, K., Escalera, S.: Sign language recognition: a deep survey. Expert Syst. Appl. 164, 113794 (2021)
Article Google Scholar
de Smedt, Q.: Dynamic hand gesture recognition-From traditional handcrafted to recent deep learning approaches. Ph.D. thesis, Université de Lille 1, Sciences et Technologies; CRIStAL UMR 9189 (2017)
Google Scholar
de Smedt, Q., Wannous, H., Vandeborre, J.P., Guerry, J., Le Saux, B., Filliat, D.: Shrec 2017 track: 3d hand gesture recognition using a depth and skeletal dataset. In: 3DOR-10th Eurographics Workshop on 3D Object Retrieval, pp. 1–6 (2017)
Google Scholar
Tu, J., Liu, M., Liu, H.: Skeleton-based human action recognition using spatial temporal 3d convolutional neural networks. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE Computer Society (2018)
Google Scholar
Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 588–595 (2014)
Google Scholar
Wei, S., Song, Y., Zhang, Y.: Human skeleton tree recurrent neural network with joint relative motion feature for skeleton based action recognition. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 91–95. IEEE (2017)
Google Scholar
Weng, J., Liu, M., Jiang, X., Yuan, J.: Deformable pose traversal convolution for 3d action and gesture recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 136–152 (2018)
Google Scholar
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Yang, F., Wu, Y., Sakti, S., Nakamura, S.: Make skeleton-based action recognition model smaller, faster and better. In: Proceedings of the ACM multimedia Asia, pp. 1–6 (2019)
Google Scholar
Zanfir, M., Leordeanu, M., Sminchisescu, C.: The moving pose: an efficient 3d kinematics descriptor for low-latency action recognition and detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2752–2759 (2013)
Google Scholar
Zhang, X., Wang, Y., Gou, M., Sznaier, M., Camps, O.: Efficient temporal sequence comparison and classification using Gram matrix embeddings on a Riemannian manifold. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4498–4507 (2016)
Google Scholar
Zhu, W., et al.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 3697–3703 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Xuzhou Xinzhi Science and Technology Co., Ltd., Xuzhou, China
Zhaochao Zhong
Xi’an Jiaotong University, Xi’an, China
Yangke Li
Nanjing LES Information Technology Co., Ltd., Nanjing, China
Jifang Yang

Authors

Zhaochao Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Yangke Li
View author publications
You can also search for this author in PubMed Google Scholar
Jifang Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yangke Li .

Editor information

Editors and Affiliations

University of the West of England, Bristol, UK
Elias Pimenidis
Lancaster University, Lancaster, UK
Plamen Angelov
Digital Innovation, Teeside University, Middlesbrough, UK
Chrisina Jayne
Democritus University of Thrace, Xanthi, Greece
Antonios Papaleonidas
The University of the West of England, Bristol, UK
Mehmet Aydin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhong, Z., Li, Y., Yang, J. (2022). Decoupled Representation Network for Skeleton-Based Hand Gesture Recognition. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13530. Springer, Cham. https://doi.org/10.1007/978-3-031-15931-2_39

Download citation

DOI: https://doi.org/10.1007/978-3-031-15931-2_39
Published: 07 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15930-5
Online ISBN: 978-3-031-15931-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Decoupled Representation Network for Skeleton-Based Hand Gesture Recognition