Abstract
Skeleton-based dynamic hand gesture recognition plays an increasing role in the human-computer interaction field. It is well known that different skeleton representations will have a greater impact on the recognition results, but most methods only use the original skeleton data as input, which hinders the improvement of accuracy to a certain extent. In this paper, we propose a novel decoupled representation network (DR-Net) for skeleton-based dynamic hand gesture recognition, which consists of temporal perception branch and spatial perception branch. For the former, it uses the temporal representation encoder to extract short-term motion features and long-term motion features, which can effectively reflect contextual information of skeleton sequences. Besides, we also design the temporal fusion module (TFM) to capture multi-scale temporal features. For the latter, we use the spatial representation encoder to extract spatial low-frequency features and spatial high-frequency features. Besides, we also design the spatial fusion module (SFM) to enhance important spatial features. Experimental results and ablation studies on two benchmark datasets demonstrate that our proposed DR-Net is competitive with the state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Boulahia, S.Y., Anquetil, E., Multon, F., Kulpa, R.: Dynamic hand gesture recognition based on 3d pattern assembled trajectories. In: 2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA), pp. 1–6. IEEE
Caetano, C., Sena, J., Brémond, F., Dos Santos, J.A., Schwartz, W.R.: Skelemotion: a new representation of skeleton joint sequences based on motion information for 3d action recognition. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–8. IEEE (2019)
Cao, W.: Application of the support vector machine algorithm based gesture recognition in human-computer interaction. Informatica 43(1), 123–127 (2019)
Chen, X., Wang, G., Guo, H., Zhang, C., Wang, H., Zhang, L.: Mfa-net: motion feature augmented network for dynamic hand gesture recognition from skeletal data. Sensors 19(2), 239 (2019)
Chen, Y., Zhao, L., Peng, X., Yuan, J., Metaxas, D.N.: Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. In: 30th British Machine Vision Conference, pp. 103–116 (2019)
Côté, S., Beaulieu, O.: VR road and construction site safety conceptual modeling based on hand gestures. Front. Robot. AI 6, 15 (2019)
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1118 (2015)
Garcia-Hernando, G., Kim, T.K.: Transition forests: learning discriminative temporal transitions for action recognition and detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 432–440 (2017)
Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.K.: First-person hand action benchmark with RGB-d videos and 3d hand pose annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 409–419 (2018)
Guo, F., He, Z., Zhang, S., Zhao, X., Fang, J., Tan, J.: Normalized edge convolutional networks for skeleton-based hand gesture recognition. Pattern Recogn. 118, 108044 (2021)
Hou, J., Wang, G., Chen, X., Xue, J.-H., Zhu, R., Yang, H.: Spatial-temporal attention res-tcn for skeleton-based dynamic hand gesture recognition. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11134, pp. 273–286. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11024-6_18
Hu, J.F., Zheng, W.S., Lai, J., Zhang, J.: Jointly learning heterogeneous features for RGB-d activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5344–5352 (2015)
Huang, Z., Gool, L.V.: A riemannian network for SPD matrix learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 2036–2042 (2017)
Huang, Z., Wu, J., Van Gool, L.: Building deep networks on grassmann manifolds. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), pp. 3279–3286. AAAI Press (2018)
Jiang, X., Xu, K., Sun, T.: Action recognition scheme based on skeleton representation with ds-LSTM network. IEEE Trans. Circ. Syst. Video Technol. 30(7), 2129–2140 (2019)
Lee, M., Lee, J., Chang, J.H.: Ensemble of jointly trained deep neural network-based acoustic models for reverberant speech recognition. Digital Sig. Process. 85, 1–9 (2019)
Li, Y., Guo, T., Liu, X., Xia, R.: Skeleton-based action recognition with lie group and deep neural networks. In: 2019 IEEE 4th International Conference on Signal and Image Processing (ICSIP), pp. 26–30. IEEE (2019)
Liu, H., Tu, J., Liu, M., Ding, R.: Learning explicit shape and motion evolution maps for skeleton-based human action recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1333–1337. IEEE (2018)
Liu, J., Liu, Y., Wang, Y., Prinet, V., Xiang, S., Pan, C.: Decoupled representation learning for skeleton-based gesture recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5751–5760 (2020)
Liu, J., Liu, N., Wang, P., Wang, M., Guo, S.: Array-less touch position identification based on a flexible capacitive tactile sensor for human-robot interactions. In: 2019 IEEE 4th International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 458–462. IEEE (2019)
Maghoumi, M., LaViola, J.J.: DeepGRU: deep gesture recognition utility. In: Bebis, G., et al. (eds.) ISVC 2019. LNCS, vol. 11844, pp. 16–31. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33720-9_2
Nasri, N., Orts-Escolano, S., Cazorla, M.: An semg-controlled 3d game for rehabilitation therapies: real-time time hand gesture recognition using deep learning techniques. Sensors 20(22), 6451 (2020)
Nguyen, X.S., Brun, L., Lézoray, O., Bougleux, S.: A neural network based on SPD manifold learning for skeleton-based hand gesture recognition. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12028–12037. IEEE (2019)
Nunez, J.C., Cabido, R., Pantrigo, J.J., Montemayor, A.S., Velez, J.F.: Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recogn. 76, 80–94 (2018)
Rastgoo, R., Kiani, K., Escalera, S.: Sign language recognition: a deep survey. Expert Syst. Appl. 164, 113794 (2021)
de Smedt, Q.: Dynamic hand gesture recognition-From traditional handcrafted to recent deep learning approaches. Ph.D. thesis, Université de Lille 1, Sciences et Technologies; CRIStAL UMR 9189 (2017)
de Smedt, Q., Wannous, H., Vandeborre, J.P., Guerry, J., Le Saux, B., Filliat, D.: Shrec 2017 track: 3d hand gesture recognition using a depth and skeletal dataset. In: 3DOR-10th Eurographics Workshop on 3D Object Retrieval, pp. 1–6 (2017)
Tu, J., Liu, M., Liu, H.: Skeleton-based human action recognition using spatial temporal 3d convolutional neural networks. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE Computer Society (2018)
Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 588–595 (2014)
Wei, S., Song, Y., Zhang, Y.: Human skeleton tree recurrent neural network with joint relative motion feature for skeleton based action recognition. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 91–95. IEEE (2017)
Weng, J., Liu, M., Jiang, X., Yuan, J.: Deformable pose traversal convolution for 3d action and gesture recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 136–152 (2018)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Yang, F., Wu, Y., Sakti, S., Nakamura, S.: Make skeleton-based action recognition model smaller, faster and better. In: Proceedings of the ACM multimedia Asia, pp. 1–6 (2019)
Zanfir, M., Leordeanu, M., Sminchisescu, C.: The moving pose: an efficient 3d kinematics descriptor for low-latency action recognition and detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2752–2759 (2013)
Zhang, X., Wang, Y., Gou, M., Sznaier, M., Camps, O.: Efficient temporal sequence comparison and classification using Gram matrix embeddings on a Riemannian manifold. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4498–4507 (2016)
Zhu, W., et al.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 3697–3703 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhong, Z., Li, Y., Yang, J. (2022). Decoupled Representation Network for Skeleton-Based Hand Gesture Recognition. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13530. Springer, Cham. https://doi.org/10.1007/978-3-031-15931-2_39
Download citation
DOI: https://doi.org/10.1007/978-3-031-15931-2_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15930-5
Online ISBN: 978-3-031-15931-2
eBook Packages: Computer ScienceComputer Science (R0)