Abstract
Hand gesture segmentation is an initial and essential step to classify hand gestures, which provides a simple, intuitive, concise and natural way for human–computer interaction, human–robot interaction. However, hand gestures segmentation with various hand shapes cluttered background is still a challenging problem. To solve the problem, a Multi-Branch Cascade Transformer Network (MBCT–Net) is proposed to segment hand regions from the cluttered background based on encoder-decoder convolutional neural networks, the encoder of the MBCT–Net consists of a deep convolutional neural network (DCNN) module and a multi-branch cascade Transformer (MBCT) module. Furthermore, the MBCT module is designed to represent local details and global semantic information of hand gestures. Moreover, to enhance semantical interaction between different windows and expand the receptive fields of MBCT-Net, we design a multi–window self-attention (MWSA) block in each branch of MBCT module to extract features of hand gestures. The MWSA block not only reduces the amount of calculation, but also enhances semantic interactions between different windows. To verify effectiveness of the proposed MBCT–Net, corresponding experiments have been conducted, and the experimental results prove correctness of the MBCT–Net.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zheng, Y., Zheng, P.: Hand segmentation based on improved gaussian mixture model. In: 2015 International Conference on Computer Science and Applications (CSA). IEEE, New York (2015)
Zhao, Y., Song, Z., Wu, X.: Hand detection using multi-resolution HOG features. In: IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, New York (2012)
Chen, Q., Georganas, N.D., Petriu, E.M.: Hand gesture recognition using Haar-like features and a stochastic context-free grammar. IEEE Trans. Instrum. Measur. 57(8), 1562–1571 (2008)
Dardas, N.H., Georganas, N.D.: Real-time hand gesture detection and recognition using bag-of-features and support vector machine techniques. IEEE Trans. Instrum. Measur. 60(11), 3592–3607 (2011)
Chuang, Y.L., Chen, L., Chen, G.C.: Saliency-guided improvement for hand posture detection and recognition. Neurocomputing 133, 404–415 (2014)
Mocanu, C., Suciu, G.: Automatic recognition of hand gestures. In 11th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), Pitesti, Romania. IEEE, New York (2019)
Cui, Z., et al.: Hand gesture segmentation against complex background based on improved atrous spatial pyramid pooling. J. Amb. Intell. Hum. Comput. (2022)
Chen, L.-C., et al.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: European Conference on Computer Vision (2018)
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, New York (2015)
Zhao, H., et al. Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Yu, F., Koltun, V.: Multi-Scale Context Aggregation by Dilated Convolutions. In: Computer Vision and Pattern Recognition (2015)
Zhang, Z.F., et al.: CENet: A Cabinet Environmental Sensing Network. In: Sensors (2010)
Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for Image recognition at scale. In: Learning (2020)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Computer Vision and Pattern Recognition (2021)
Liu, C., et al.: Adaptive threshold gesture segmentation algorithm based on skin color (2016)
Wang, W., Pan, J.: Hand segmentation using skin color and background information. In 2012 International Conference on Machine Learning and Cybernetics. IEEE, New York (2012)
Sun, J.-H., et al.: Research on the hand gesture recognition based on deep learning. In: 2018 12th International Symposium on Antennas, Propagation and EM Theory (ISAPE). IEEE, New York (2018)
Mei, K.Z., et al.: Training more discriminative multi-class classifiers for hand detection. Patt. Recog. 48(3), 785–797 (2015)
Dadgostar, F., Sarrafzadeh, A., Messom, C.: Multi-layered hand and face tracking for real-time gesture recognition. In: International Conference on Neural Information Processing. Springer, New York (2008)
Chen, F.-S., Fu, C.-M., Huang, C.-L.: Hand gesture recognition using a real-time tracking method and hidden Markov models. Image Vision Comput. 21(8), 745–758 (2003)
Karishma, S.N., Lathasree, V.: Fusion of skin color detection and background subtraction for hand gesture segmentation. Int. Jo. Eng. Res. Technol. 3, 2 (2014)
Stergiopoulou, E., et al.: Real time hand detection in a complex background. Eng. Appl. Artif. Intell. 35, 54–70 (2014)
Pedro, L.M., et al.: Hand gesture recognition for robot hand teleoperation. In: ABCM Symposium Series in Mechatronics (2012)
Tang, J.W., et al.: Position-free hand gesture recognition using single shot multibox detector based neural network. In: 16th IEEE International Conference on Mechatronics and Automation (IEEE ICMA), Tianjin, China (2019)
Al-Hammadi, M., et al.: Deep learning-based approach for sign language gesture recognition with efficient hand gesture representation. IEEE Access 8, 192527–192542 (2020)
Dadashzadeh, A., et al.: HGR-Net: a fusion network for hand gesture segmentation and recognition. IET Comput. Vis. 13(8), 700–707 (2019)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Devlin, J., et al.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: North American Chapter of the Association for Computational Linguistics (2018)
Touvron, H., et al.: Training data-efficient image transformers and distillation through attention. In: Computer Vision and Pattern Recognition (2020)
Hu, H., et al.: Local relation networks for image recognition. In: International Conference on Computer Vision (2019)
Hu, H., et al., Relation networks for object detection. In: Computer Vision and Pattern Recognition (2018)
Matilainen, M., et al.: OUHANDS database for hand detection and pose recognition. In: International Conference on Image Processing (2016)
Garcia-Garcia, A., et al.: A Review on Deep Learning Techniques Applied to Semantic Segmentation. abs/1704.06857
Zhang, Q., et al.: Segmentation of Hand Posture against Complex Backgrounds Based on Saliency and Skin Colour Detection
Zhao, H., et al., ICNet for real-time semantic segmentation on high-resolution images. In: European Conference on Computer Vision (2017)
Chen, L.-C., et al.: Rethinking Atrous Convolution for Semantic Image Segmentation. In: Computer Vision and Pattern Recognition (2017)
Acknowledgment
This work was supported by Scientific Research Foundation for Talented Scholars of Hebei University (Grant No. 521100221081), Innovation and Entrepreneurship Training Program for College students of Hebei University (Grant No. 2022156), Scientific Research Foundation of Colleges and Universities in Hebei Province (Grant No. QN2022107) and Science and Technology Program of Hebei Province (Grant No. 22370301D).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cui, Z., Zhou, G., Qi, J., Wang, H., Ding, X. (2022). A Multi-branch Cascade Transformer Network (MBCT–Net) for Hand Gesture Segmentation in Cluttered Background. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13604. Springer, Cham. https://doi.org/10.1007/978-3-031-20497-5_44
Download citation
DOI: https://doi.org/10.1007/978-3-031-20497-5_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20496-8
Online ISBN: 978-3-031-20497-5
eBook Packages: Computer ScienceComputer Science (R0)