Hand gesture segmentation against complex background based on improved atrous spatial pyramid pooling

Cui, Zhenchao; Lei, Yu; Wang, Yuxiao; Yang, Wenzhu; Qi, Jing

doi:10.1007/s12652-022-03736-w

Hand gesture segmentation against complex background based on improved atrous spatial pyramid pooling

Original Research
Published: 02 February 2022

Volume 14, pages 11795–11807, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Zhenchao Cui^1,2,
Yu Lei^1,2,
Yuxiao Wang¹,
Wenzhu Yang² &
…
Jing Qi ORCID: orcid.org/0000-0002-9383-9440¹

562 Accesses
6 Citations
Explore all metrics

Abstract

Gesture segmentation is an essential part of gesture detection. The accuracy of gesture detection can be improved by using gesture segmentation to remove the background part un-hand images. However, the inaccurate features of current methods can greatly affect the accuracy of results in segmentation and gesture recognition. In order to solve this problem and obtain accurate features, this paper proposes the improved atrous spatial pyramid pooling (IASPP). IASPP is a pooling layer in convolution neural network, which can refine features by connecting cascade model and parallel model in atrous spatial pyramid pooling. Otherwise, in order to improve the segmentation performance by integrating details and spatial location information at different levels, the IASPP is embedded in the encoder-decoder, and we name the method the improved atrous spatial pyramid pooling-ResNet (IASPP-ResNet) for gesture segmentation. In the experiment part of this paper, we test the proposed method by comparing it with the states of art on the two datasets of OUTHANDS and HGR. It can be seen that IASPP-ResNet can achieve 97.75% Pixel Accuracy and 89.60% MIoU on the OUTHANDS dataset. The Pixel Accuracy and MIoU of the presented method on the HGR dataset can reach 99.09% and 97.52%, respectively. These presented that our method is superior to the states of art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Research on gesture segmentation method based on FCN combined with CBAM-ResNet50

Article 12 July 2024

Blsnet: a tri-branch lightweight network for gesture segmentation against cluttered backgrounds

Article Open access 12 December 2023

Harmonizing local and global features: enhanced hand gesture segmentation using synergistic fusion of CNN and transformer networks

Article 18 May 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Chen D, Li G, Sun Y et al (2017a) Fusion hand gesture segmentation and extraction based on CMOS sensor and 3D sensor. Int J Wirel Mob Comput 12(3):305–312. https://doi.org/10.1504/IJWMC.2017.084818
Article Google Scholar
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017b) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184
Article Google Scholar
Chen L C, Papandreou G, Schroff F, Adam H (2017c) Rethinking atrous convolution for semantic image segmentation. https://arXiv.org/1706.05587
Chen L C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision. Springer, Munich, Germany, pp 801–818. https://doi.org/10.1007/978-3-030-01234-2_49
Cheng MM, Mitra NJ, Huang X, Torr PH, Hu SM (2014) Global contrast based salient region detection. IEEE Trans Pattern Anal Mach Intell 37(3):569–582. https://doi.org/10.1109/TPAMI.2014.2345401
Article Google Scholar
Conaire CO, O'Connor NE, Smeaton AF (2007) Detector adaptation by maximising agreement between independent data sources. In: Conference on Computer Vision and Pattern Recognition. IEEE, Minneapolis, MN, USA, pp 1–6. https://doi.org/10.1109/CVPR.2007.383448
Dadashzadeh A, Targhi AT, Tahmasbi M, Mirmehdi M (2019) HGR-Net: a fusion network for hand gesture segmentation and recognition. IET Comput vis 13(8):700–707. https://doi.org/10.1049/iet-cvi.2018.5796
Article Google Scholar
Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Garcia-Rodriguez J (2017) A review on deep learning techniques applied to semantic segmentation. https://arXiv.org/1704.06857
Guo Y, Liu Y, Georgiou T, Lew MS (2018) A review of semantic segmentation using deep neural networks. Int J Multimed Inform Retr 7(2):87–93. https://doi.org/10.1007/s13735-017-0141-z
Article Google Scholar
Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with Microsoft kinect sensor: a review. IEEE Trans Cybern 43(5):1318–1334. https://doi.org/10.1109/TCYB.2013.2265378
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Las Vegas, USA, pp 770–778. https://doi.org/10.1109/cvpr.2016.90
He A, Luo C, Tian X, Zeng W (2018) A twofold siamese network for real-time object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake, Utah, USA, pp 4834–4843
Hou QB, Zhang L, Cheng MM, Feng J (2020) Strip pooling: rethinking spatial pooling for scene parsing. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, pp 4003–4012. https://doi.org/10.1109/CVPR42600.2020.00406
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Honolulu, Hawaii, pp 4700–4708
Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: Proceedings of the European Conference on Computer Vision. Springer, Munich, Germany, pp 784–799
Jones MJ, Rehg JM (2002) Statistical color models with application to skin detection. Int J Comput vis 46(1):81–96. https://doi.org/10.1023/A:1013200319198
Article MATH Google Scholar
Kawulok M, Kawulok J, Smolka B (2012) Discriminative textural features for image and video colorization. IEICE Trans Inf Syst 95(7):1722–1730. https://doi.org/10.1587/transinf.E95.D.1722
Article Google Scholar
Kawulok M, Kawulok J, Nalepa J (2014) Spatial-based skin detection using discriminative skin-presence features. Pattern Recogn Lett 41:3–13. https://doi.org/10.1016/j.patrec.2013.08.028
Article Google Scholar
Lin TY, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Honolulu, Hawaii, pp 2117–2125. https://doi.org/10.1109/CVPR.2017.106
Liu C, Wang J, Zhang T, Ding D (2016) Adaptive threshold gesture segmentation algorithm based on skin color. In: Proceedings of 2016 2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016), pp 1602–1605. https://doi.org/10.2991/ameii-16.2016.301
Liu J, Wang X, Tai X C (2020) Deep convolutional neural networks with spatial regularization, volume and star-shape priori for image segmentation. https://arXiv.org/2002.03989
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Boston, MA, USA, pp 3431–3440. https://doi.org/10.1109/CVPR.2015.7298965
Luo HL, Zhang Y (2019) A survey of image semantic segmentation based on deep network. Acta Electron Sin 47(10):2211–2220
Google Scholar
Matilainen M, Sangi P, Holappa J, Silvén O (2016) OUHANDS database for hand detection and pose recognition. In: Sixth International Conference on Image Processing Theory, Tools and Applications (IPTA). IEEE, Oulu, Finland, pp 1–5. https://doi.org/10.1109/IPTA.2016.7821025
Sayed U, Mofaddel M, Bakheet S, El-Zohry Z (2018) An elliptical boundary skin model for hand detection based on HSV color space. Inform Sci Lett 7(1):13–17. https://doi.org/10.18576/isl/070103
Article Google Scholar
Simonyan K, Zisserman (2014) Very deep convolutional networks for large-scale image recognition. https://arXiv.org/1409.1556
Singh DK (2017) Gaussian elliptical fitting based skin color modeling for human detection. In: 2017 IEEE 8th Control and System Graduate Research Colloquium (ICSGRC), pp 197–201. https://doi.org/10.1109/ICSGRC.2017.8070594
Tian Z, He T, Shen C, Yan Y (2019) Decoders matter for semantic segmentation: data-dependent decoding enables flexible feature aggregation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, pp 3126–3135. https://doi.org/10.1109/CVPR.2019.00324
Tofighi G, Monadjemi SA, Ghasem-Aghaee N (2010) Rapid hand posture recognition using adaptive histogram template of skin and hand edge contour. In: Iranian Conference on Machine Vision and Image Processing. IEEE, Isfahan, Iran, pp 1–5. https://doi.org/10.1109/IranianMVIP.2010.5941173
Triesch J, Von Der Malsburg C (2001) A system for person-independent hand posture recognition against complex backgrounds. IEEE Trans Pattern Anal Mach Intell 23(12):1449–1453. https://doi.org/10.1109/34.977568
Article Google Scholar
Wang W, Pan J (2012) Hand segmentation using skin color and background information. In: International Conference on Machine Learning and Cybernetics. IEEE, Xian, China, pp 1487–1492. https://doi.org/10.1109/ICMLC.2012.6359584
Wang X, Fang Y, Li C, Gong S, Yu L, Fei S (2019) Static gesture segmentation technique based on improved Sobel operator. J Eng 2019(22):8339–8342. https://doi.org/10.1049/joe.2019.1075
Article Google Scholar
Wang S, Liu Y, He Z, Wang Y, Tang Z (2020) A quadrilateral scene text detector with two-stage network architecture. Pattern Recogn 102:107230. https://doi.org/10.1016/j.patcog.2020.107230
Article Google Scholar
Wei BG, Xu Y, Liu JW, Zhou JM (2020) Adaptive gesture segmentation based on SSD object detection. J Signal Process 36(07):1038–1047
Google Scholar
Yang M, Yu K, Zhang C, Li Z, Yang K (2018) DenseASPP for semantic segmentation in street scenes. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, pp 3684–3692. https://doi.org/10.1109/CVPR.2018.00388
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. https://arXiv.org/1511.07122
Zhang L, van der Maaten L (2013) Structure preserving object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Portland, Oregon, USA, pp 1838–1845
Zhang QR (2018) Research on hand gesture segmentation algorithm with complex background. Dissertation, Shandong University
Zhang Q, Yang M, Kpalma K, Zheng Q, Zhang X (2018) Segmentation of hand posture against complex backgrounds based on saliency and skin colour detection. IAENG Int J Comput Sci 45(3):435–444
Google Scholar
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Honolulu, HI, USA, pp 2881–2890. https://doi.org/10.1109/CVPR.2017.660
Zhao H, Qi X, Shen X, Shi J, Jia J (2018) Icnet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European Conference on Computer Vision (ECCV). Springer, Munich, Germany, pp 405–420
Zheng Y, Zheng P (2015) Hand segmentation based on improved Gaussian mixture model. In: 2015 International Conference on Computer Science and Applications (CSA). IEEE, Wuhan, China, pp 168–171. https://doi.org/10.1109/CSA.2015.14
Zhu H, Miao Y, Zhang X (2020a) Semantic image segmentation with improved position attention and feature fusion. Neural Process Lett 52:329–351. https://doi.org/10.1007/s11063-020-10240-9
Article Google Scholar
Zhu H, Wang B, Zhang X, Liu J (2020b) Semantic image segmentation with shared decomposition convolution and boundary reinforcement structure. Appl Intell 2020:1–14. https://doi.org/10.1007/s10489-020-01671-x
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Scientific Research Foundation for Advanced Talents of Hebei University (521100221081) and the Post graduate's Innovation Fund Project of Hebei University (HBU2021ss061).

Author information

Authors and Affiliations

School of Cyber Security and Computer, Hebei University, Baoding, 071002, Hebei, China
Zhenchao Cui, Yu Lei, Yuxiao Wang & Jing Qi
Hebei Machine Vision Engineering Research Center (Hebei University), Baoding, 071002, Hebei, China
Zhenchao Cui, Yu Lei & Wenzhu Yang

Authors

Zhenchao Cui
View author publications
You can also search for this author in PubMed Google Scholar
Yu Lei
View author publications
You can also search for this author in PubMed Google Scholar
Yuxiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wenzhu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Qi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Qi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cui, Z., Lei, Y., Wang, Y. et al. Hand gesture segmentation against complex background based on improved atrous spatial pyramid pooling. J Ambient Intell Human Comput 14, 11795–11807 (2023). https://doi.org/10.1007/s12652-022-03736-w

Download citation

Received: 15 April 2021
Accepted: 19 January 2022
Published: 02 February 2022
Issue Date: September 2023
DOI: https://doi.org/10.1007/s12652-022-03736-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hand gesture segmentation against complex background based on improved atrous spatial pyramid pooling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Research on gesture segmentation method based on FCN combined with CBAM-ResNet50

Blsnet: a tri-branch lightweight network for gesture segmentation against cluttered backgrounds

Harmonizing local and global features: enhanced hand gesture segmentation using synergistic fusion of CNN and transformer networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Hand gesture segmentation against complex background based on improved atrous spatial pyramid pooling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Research on gesture segmentation method based on FCN combined with CBAM-ResNet50

Blsnet: a tri-branch lightweight network for gesture segmentation against cluttered backgrounds

Harmonizing local and global features: enhanced hand gesture segmentation using synergistic fusion of CNN and transformer networks

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation