A Multi-branch Cascade Transformer Network (MBCT–Net) for Hand Gesture Segmentation in Cluttered Background

Cui, Zhenchao; Zhou, Guoyu; Qi, Jing; Wang, Huimin; Ding, Xilun

doi:10.1007/978-3-031-20497-5_44

Zhenchao Cui¹²,
Guoyu Zhou¹²,
Jing Qi¹²,
Huimin Wang¹³ &
…
Xilun Ding¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13604))

Included in the following conference series:

CAAI International Conference on Artificial Intelligence

1279 Accesses

Abstract

Hand gesture segmentation is an initial and essential step to classify hand gestures, which provides a simple, intuitive, concise and natural way for human–computer interaction, human–robot interaction. However, hand gestures segmentation with various hand shapes cluttered background is still a challenging problem. To solve the problem, a Multi-Branch Cascade Transformer Network (MBCT–Net) is proposed to segment hand regions from the cluttered background based on encoder-decoder convolutional neural networks, the encoder of the MBCT–Net consists of a deep convolutional neural network (DCNN) module and a multi-branch cascade Transformer (MBCT) module. Furthermore, the MBCT module is designed to represent local details and global semantic information of hand gestures. Moreover, to enhance semantical interaction between different windows and expand the receptive fields of MBCT-Net, we design a multi–window self-attention (MWSA) block in each branch of MBCT module to extract features of hand gestures. The MWSA block not only reduces the amount of calculation, but also enhances semantic interactions between different windows. To verify effectiveness of the proposed MBCT–Net, corresponding experiments have been conducted, and the experimental results prove correctness of the MBCT–Net.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zheng, Y., Zheng, P.: Hand segmentation based on improved gaussian mixture model. In: 2015 International Conference on Computer Science and Applications (CSA). IEEE, New York (2015)
Google Scholar
Zhao, Y., Song, Z., Wu, X.: Hand detection using multi-resolution HOG features. In: IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, New York (2012)
Google Scholar
Chen, Q., Georganas, N.D., Petriu, E.M.: Hand gesture recognition using Haar-like features and a stochastic context-free grammar. IEEE Trans. Instrum. Measur. 57(8), 1562–1571 (2008)
Article Google Scholar
Dardas, N.H., Georganas, N.D.: Real-time hand gesture detection and recognition using bag-of-features and support vector machine techniques. IEEE Trans. Instrum. Measur. 60(11), 3592–3607 (2011)
Article Google Scholar
Chuang, Y.L., Chen, L., Chen, G.C.: Saliency-guided improvement for hand posture detection and recognition. Neurocomputing 133, 404–415 (2014)
Article Google Scholar
Mocanu, C., Suciu, G.: Automatic recognition of hand gestures. In 11th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), Pitesti, Romania. IEEE, New York (2019)
Google Scholar
Cui, Z., et al.: Hand gesture segmentation against complex background based on improved atrous spatial pyramid pooling. J. Amb. Intell. Hum. Comput. (2022)
Google Scholar
Chen, L.-C., et al.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: European Conference on Computer Vision (2018)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, New York (2015)
Google Scholar
Zhao, H., et al. Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Yu, F., Koltun, V.: Multi-Scale Context Aggregation by Dilated Convolutions. In: Computer Vision and Pattern Recognition (2015)
Google Scholar
Zhang, Z.F., et al.: CENet: A Cabinet Environmental Sensing Network. In: Sensors (2010)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for Image recognition at scale. In: Learning (2020)
Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Computer Vision and Pattern Recognition (2021)
Google Scholar
Liu, C., et al.: Adaptive threshold gesture segmentation algorithm based on skin color (2016)
Google Scholar
Wang, W., Pan, J.: Hand segmentation using skin color and background information. In 2012 International Conference on Machine Learning and Cybernetics. IEEE, New York (2012)
Google Scholar
Sun, J.-H., et al.: Research on the hand gesture recognition based on deep learning. In: 2018 12th International Symposium on Antennas, Propagation and EM Theory (ISAPE). IEEE, New York (2018)
Google Scholar
Mei, K.Z., et al.: Training more discriminative multi-class classifiers for hand detection. Patt. Recog. 48(3), 785–797 (2015)
Article Google Scholar
Dadgostar, F., Sarrafzadeh, A., Messom, C.: Multi-layered hand and face tracking for real-time gesture recognition. In: International Conference on Neural Information Processing. Springer, New York (2008)
Google Scholar
Chen, F.-S., Fu, C.-M., Huang, C.-L.: Hand gesture recognition using a real-time tracking method and hidden Markov models. Image Vision Comput. 21(8), 745–758 (2003)
Article Google Scholar
Karishma, S.N., Lathasree, V.: Fusion of skin color detection and background subtraction for hand gesture segmentation. Int. Jo. Eng. Res. Technol. 3, 2 (2014)
Google Scholar
Stergiopoulou, E., et al.: Real time hand detection in a complex background. Eng. Appl. Artif. Intell. 35, 54–70 (2014)
Article Google Scholar
Pedro, L.M., et al.: Hand gesture recognition for robot hand teleoperation. In: ABCM Symposium Series in Mechatronics (2012)
Google Scholar
Tang, J.W., et al.: Position-free hand gesture recognition using single shot multibox detector based neural network. In: 16th IEEE International Conference on Mechatronics and Automation (IEEE ICMA), Tianjin, China (2019)
Google Scholar
Al-Hammadi, M., et al.: Deep learning-based approach for sign language gesture recognition with efficient hand gesture representation. IEEE Access 8, 192527–192542 (2020)
Article Google Scholar
Dadashzadeh, A., et al.: HGR-Net: a fusion network for hand gesture segmentation and recognition. IET Comput. Vis. 13(8), 700–707 (2019)
Article Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
Devlin, J., et al.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: North American Chapter of the Association for Computational Linguistics (2018)
Google Scholar
Touvron, H., et al.: Training data-efficient image transformers and distillation through attention. In: Computer Vision and Pattern Recognition (2020)
Google Scholar
Hu, H., et al.: Local relation networks for image recognition. In: International Conference on Computer Vision (2019)
Google Scholar
Hu, H., et al., Relation networks for object detection. In: Computer Vision and Pattern Recognition (2018)
Google Scholar
Matilainen, M., et al.: OUHANDS database for hand detection and pose recognition. In: International Conference on Image Processing (2016)
Google Scholar
Garcia-Garcia, A., et al.: A Review on Deep Learning Techniques Applied to Semantic Segmentation. abs/1704.06857
Google Scholar
Zhang, Q., et al.: Segmentation of Hand Posture against Complex Backgrounds Based on Saliency and Skin Colour Detection
Google Scholar
Zhao, H., et al., ICNet for real-time semantic segmentation on high-resolution images. In: European Conference on Computer Vision (2017)
Google Scholar
Chen, L.-C., et al.: Rethinking Atrous Convolution for Semantic Image Segmentation. In: Computer Vision and Pattern Recognition (2017)
Google Scholar

Download references

Acknowledgment

This work was supported by Scientific Research Foundation for Talented Scholars of Hebei University (Grant No. 521100221081), Innovation and Entrepreneurship Training Program for College students of Hebei University (Grant No. 2022156), Scientific Research Foundation of Colleges and Universities in Hebei Province (Grant No. QN2022107) and Science and Technology Program of Hebei Province (Grant No. 22370301D).

Author information

Authors and Affiliations

School of Cyber Security and Computer, Hebei University, Baoding, China
Zhenchao Cui, Guoyu Zhou & Jing Qi
Beijing University of Technology, Beijing, China
Huimin Wang
Robotics Institute, Beihang University, Beijing, China
Xilun Ding

Authors

Zhenchao Cui
View author publications
You can also search for this author in PubMed Google Scholar
Guoyu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jing Qi
View author publications
You can also search for this author in PubMed Google Scholar
Huimin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xilun Ding
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Qi .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Lu Fang
Xiaomi Inc., Beijing, China
Daniel Povey
Shanghai Jiao Tong University, Shanghai, China
Guangtao Zhai
JD Explore Academy, Beijing, China
Tao Mei
Chinese Academy of Sciences, Beijing, China
Ruiping Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cui, Z., Zhou, G., Qi, J., Wang, H., Ding, X. (2022). A Multi-branch Cascade Transformer Network (MBCT–Net) for Hand Gesture Segmentation in Cluttered Background. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13604. Springer, Cham. https://doi.org/10.1007/978-3-031-20497-5_44

Download citation

DOI: https://doi.org/10.1007/978-3-031-20497-5_44
Published: 17 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20496-8
Online ISBN: 978-3-031-20497-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics