Skip to main content

A Multi-branch Cascade Transformer Network (MBCT–Net) for Hand Gesture Segmentation in Cluttered Background

  • Conference paper
  • First Online:
Artificial Intelligence (CICAI 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13604))

Included in the following conference series:

  • 1279 Accesses

Abstract

Hand gesture segmentation is an initial and essential step to classify hand gestures, which provides a simple, intuitive, concise and natural way for human–computer interaction, human–robot interaction. However, hand gestures segmentation with various hand shapes cluttered background is still a challenging problem. To solve the problem, a Multi-Branch Cascade Transformer Network (MBCT–Net) is proposed to segment hand regions from the cluttered background based on encoder-decoder convolutional neural networks, the encoder of the MBCT–Net consists of a deep convolutional neural network (DCNN) module and a multi-branch cascade Transformer (MBCT) module. Furthermore, the MBCT module is designed to represent local details and global semantic information of hand gestures. Moreover, to enhance semantical interaction between different windows and expand the receptive fields of MBCT-Net, we design a multi–window self-attention (MWSA) block in each branch of MBCT module to extract features of hand gestures. The MWSA block not only reduces the amount of calculation, but also enhances semantic interactions between different windows. To verify effectiveness of the proposed MBCT–Net, corresponding experiments have been conducted, and the experimental results prove correctness of the MBCT–Net.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zheng, Y., Zheng, P.: Hand segmentation based on improved gaussian mixture model. In: 2015 International Conference on Computer Science and Applications (CSA). IEEE, New York (2015)

    Google Scholar 

  2. Zhao, Y., Song, Z., Wu, X.: Hand detection using multi-resolution HOG features. In: IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, New York (2012)

    Google Scholar 

  3. Chen, Q., Georganas, N.D., Petriu, E.M.: Hand gesture recognition using Haar-like features and a stochastic context-free grammar. IEEE Trans. Instrum. Measur. 57(8), 1562–1571 (2008)

    Article  Google Scholar 

  4. Dardas, N.H., Georganas, N.D.: Real-time hand gesture detection and recognition using bag-of-features and support vector machine techniques. IEEE Trans. Instrum. Measur. 60(11), 3592–3607 (2011)

    Article  Google Scholar 

  5. Chuang, Y.L., Chen, L., Chen, G.C.: Saliency-guided improvement for hand posture detection and recognition. Neurocomputing 133, 404–415 (2014)

    Article  Google Scholar 

  6. Mocanu, C., Suciu, G.: Automatic recognition of hand gestures. In 11th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), Pitesti, Romania. IEEE, New York (2019)

    Google Scholar 

  7. Cui, Z., et al.: Hand gesture segmentation against complex background based on improved atrous spatial pyramid pooling. J. Amb. Intell. Hum. Comput. (2022)

    Google Scholar 

  8. Chen, L.-C., et al.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: European Conference on Computer Vision (2018)

    Google Scholar 

  9. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, New York (2015)

    Google Scholar 

  10. Zhao, H., et al. Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  11. Yu, F., Koltun, V.: Multi-Scale Context Aggregation by Dilated Convolutions. In: Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  12. Zhang, Z.F., et al.: CENet: A Cabinet Environmental Sensing Network. In: Sensors (2010)

    Google Scholar 

  13. Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for Image recognition at scale. In: Learning (2020)

    Google Scholar 

  14. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Computer Vision and Pattern Recognition (2021)

    Google Scholar 

  15. Liu, C., et al.: Adaptive threshold gesture segmentation algorithm based on skin color (2016)

    Google Scholar 

  16. Wang, W., Pan, J.: Hand segmentation using skin color and background information. In 2012 International Conference on Machine Learning and Cybernetics. IEEE, New York (2012)

    Google Scholar 

  17. Sun, J.-H., et al.: Research on the hand gesture recognition based on deep learning. In: 2018 12th International Symposium on Antennas, Propagation and EM Theory (ISAPE). IEEE, New York (2018)

    Google Scholar 

  18. Mei, K.Z., et al.: Training more discriminative multi-class classifiers for hand detection. Patt. Recog. 48(3), 785–797 (2015)

    Article  Google Scholar 

  19. Dadgostar, F., Sarrafzadeh, A., Messom, C.: Multi-layered hand and face tracking for real-time gesture recognition. In: International Conference on Neural Information Processing. Springer, New York (2008)

    Google Scholar 

  20. Chen, F.-S., Fu, C.-M., Huang, C.-L.: Hand gesture recognition using a real-time tracking method and hidden Markov models. Image Vision Comput. 21(8), 745–758 (2003)

    Article  Google Scholar 

  21. Karishma, S.N., Lathasree, V.: Fusion of skin color detection and background subtraction for hand gesture segmentation. Int. Jo. Eng. Res. Technol. 3, 2 (2014)

    Google Scholar 

  22. Stergiopoulou, E., et al.: Real time hand detection in a complex background. Eng. Appl. Artif. Intell. 35, 54–70 (2014)

    Article  Google Scholar 

  23. Pedro, L.M., et al.: Hand gesture recognition for robot hand teleoperation. In: ABCM Symposium Series in Mechatronics (2012)

    Google Scholar 

  24. Tang, J.W., et al.: Position-free hand gesture recognition using single shot multibox detector based neural network. In: 16th IEEE International Conference on Mechatronics and Automation (IEEE ICMA), Tianjin, China (2019)

    Google Scholar 

  25. Al-Hammadi, M., et al.: Deep learning-based approach for sign language gesture recognition with efficient hand gesture representation. IEEE Access 8, 192527–192542 (2020)

    Article  Google Scholar 

  26. Dadashzadeh, A., et al.: HGR-Net: a fusion network for hand gesture segmentation and recognition. IET Comput. Vis. 13(8), 700–707 (2019)

    Article  Google Scholar 

  27. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  28. Devlin, J., et al.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: North American Chapter of the Association for Computational Linguistics (2018)

    Google Scholar 

  29. Touvron, H., et al.: Training data-efficient image transformers and distillation through attention. In: Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  30. Hu, H., et al.: Local relation networks for image recognition. In: International Conference on Computer Vision (2019)

    Google Scholar 

  31. Hu, H., et al., Relation networks for object detection. In: Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  32. Matilainen, M., et al.: OUHANDS database for hand detection and pose recognition. In: International Conference on Image Processing (2016)

    Google Scholar 

  33. Garcia-Garcia, A., et al.: A Review on Deep Learning Techniques Applied to Semantic Segmentation. abs/1704.06857

    Google Scholar 

  34. Zhang, Q., et al.: Segmentation of Hand Posture against Complex Backgrounds Based on Saliency and Skin Colour Detection

    Google Scholar 

  35. Zhao, H., et al., ICNet for real-time semantic segmentation on high-resolution images. In: European Conference on Computer Vision (2017)

    Google Scholar 

  36. Chen, L.-C., et al.: Rethinking Atrous Convolution for Semantic Image Segmentation. In: Computer Vision and Pattern Recognition (2017)

    Google Scholar 

Download references

Acknowledgment

This work was supported by Scientific Research Foundation for Talented Scholars of Hebei University (Grant No. 521100221081), Innovation and Entrepreneurship Training Program for College students of Hebei University (Grant No. 2022156), Scientific Research Foundation of Colleges and Universities in Hebei Province (Grant No. QN2022107) and Science and Technology Program of Hebei Province (Grant No. 22370301D).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Qi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cui, Z., Zhou, G., Qi, J., Wang, H., Ding, X. (2022). A Multi-branch Cascade Transformer Network (MBCT–Net) for Hand Gesture Segmentation in Cluttered Background. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13604. Springer, Cham. https://doi.org/10.1007/978-3-031-20497-5_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20497-5_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20496-8

  • Online ISBN: 978-3-031-20497-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics