Skip to main content
Log in

Hand gesture segmentation against complex background based on improved atrous spatial pyramid pooling

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Gesture segmentation is an essential part of gesture detection. The accuracy of gesture detection can be improved by using gesture segmentation to remove the background part un-hand images. However, the inaccurate features of current methods can greatly affect the accuracy of results in segmentation and gesture recognition. In order to solve this problem and obtain accurate features, this paper proposes the improved atrous spatial pyramid pooling (IASPP). IASPP is a pooling layer in convolution neural network, which can refine features by connecting cascade model and parallel model in atrous spatial pyramid pooling. Otherwise, in order to improve the segmentation performance by integrating details and spatial location information at different levels, the IASPP is embedded in the encoder-decoder, and we name the method the improved atrous spatial pyramid pooling-ResNet (IASPP-ResNet) for gesture segmentation. In the experiment part of this paper, we test the proposed method by comparing it with the states of art on the two datasets of OUTHANDS and HGR. It can be seen that IASPP-ResNet can achieve 97.75% Pixel Accuracy and 89.60% MIoU on the OUTHANDS dataset. The Pixel Accuracy and MIoU of the presented method on the HGR dataset can reach 99.09% and 97.52%, respectively. These presented that our method is superior to the states of art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Chen D, Li G, Sun Y et al (2017a) Fusion hand gesture segmentation and extraction based on CMOS sensor and 3D sensor. Int J Wirel Mob Comput 12(3):305–312. https://doi.org/10.1504/IJWMC.2017.084818

    Article  Google Scholar 

  • Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017b) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184

    Article  Google Scholar 

  • Chen L C, Papandreou G, Schroff F, Adam H (2017c) Rethinking atrous convolution for semantic image segmentation. https://arXiv.org/1706.05587

  • Chen L C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision. Springer, Munich, Germany, pp 801–818. https://doi.org/10.1007/978-3-030-01234-2_49

  • Cheng MM, Mitra NJ, Huang X, Torr PH, Hu SM (2014) Global contrast based salient region detection. IEEE Trans Pattern Anal Mach Intell 37(3):569–582. https://doi.org/10.1109/TPAMI.2014.2345401

    Article  Google Scholar 

  • Conaire CO, O'Connor NE, Smeaton AF (2007) Detector adaptation by maximising agreement between independent data sources. In: Conference on Computer Vision and Pattern Recognition. IEEE, Minneapolis, MN, USA, pp 1–6. https://doi.org/10.1109/CVPR.2007.383448

  • Dadashzadeh A, Targhi AT, Tahmasbi M, Mirmehdi M (2019) HGR-Net: a fusion network for hand gesture segmentation and recognition. IET Comput vis 13(8):700–707. https://doi.org/10.1049/iet-cvi.2018.5796

    Article  Google Scholar 

  • Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Garcia-Rodriguez J (2017) A review on deep learning techniques applied to semantic segmentation. https://arXiv.org/1704.06857

  • Guo Y, Liu Y, Georgiou T, Lew MS (2018) A review of semantic segmentation using deep neural networks. Int J Multimed Inform Retr 7(2):87–93. https://doi.org/10.1007/s13735-017-0141-z

    Article  Google Scholar 

  • Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with Microsoft kinect sensor: a review. IEEE Trans Cybern 43(5):1318–1334. https://doi.org/10.1109/TCYB.2013.2265378

    Article  Google Scholar 

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Las Vegas, USA, pp 770–778. https://doi.org/10.1109/cvpr.2016.90

  • He A, Luo C, Tian X, Zeng W (2018) A twofold siamese network for real-time object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake, Utah, USA, pp 4834–4843

  • Hou QB, Zhang L, Cheng MM, Feng J (2020) Strip pooling: rethinking spatial pooling for scene parsing. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, pp 4003–4012. https://doi.org/10.1109/CVPR42600.2020.00406

  • Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Honolulu, Hawaii, pp 4700–4708

  • Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: Proceedings of the European Conference on Computer Vision. Springer, Munich, Germany, pp 784–799

  • Jones MJ, Rehg JM (2002) Statistical color models with application to skin detection. Int J Comput vis 46(1):81–96. https://doi.org/10.1023/A:1013200319198

    Article  MATH  Google Scholar 

  • Kawulok M, Kawulok J, Smolka B (2012) Discriminative textural features for image and video colorization. IEICE Trans Inf Syst 95(7):1722–1730. https://doi.org/10.1587/transinf.E95.D.1722

    Article  Google Scholar 

  • Kawulok M, Kawulok J, Nalepa J (2014) Spatial-based skin detection using discriminative skin-presence features. Pattern Recogn Lett 41:3–13. https://doi.org/10.1016/j.patrec.2013.08.028

    Article  Google Scholar 

  • Lin TY, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Honolulu, Hawaii, pp 2117–2125. https://doi.org/10.1109/CVPR.2017.106

  • Liu C, Wang J, Zhang T, Ding D (2016) Adaptive threshold gesture segmentation algorithm based on skin color. In: Proceedings of 2016 2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016), pp 1602–1605. https://doi.org/10.2991/ameii-16.2016.301

  • Liu J, Wang X, Tai X C (2020) Deep convolutional neural networks with spatial regularization, volume and star-shape priori for image segmentation. https://arXiv.org/2002.03989

  • Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Boston, MA, USA, pp 3431–3440. https://doi.org/10.1109/CVPR.2015.7298965

  • Luo HL, Zhang Y (2019) A survey of image semantic segmentation based on deep network. Acta Electron Sin 47(10):2211–2220

    Google Scholar 

  • Matilainen M, Sangi P, Holappa J, Silvén O (2016) OUHANDS database for hand detection and pose recognition. In: Sixth International Conference on Image Processing Theory, Tools and Applications (IPTA). IEEE, Oulu, Finland, pp 1–5. https://doi.org/10.1109/IPTA.2016.7821025

  • Sayed U, Mofaddel M, Bakheet S, El-Zohry Z (2018) An elliptical boundary skin model for hand detection based on HSV color space. Inform Sci Lett 7(1):13–17. https://doi.org/10.18576/isl/070103

    Article  Google Scholar 

  • Simonyan K, Zisserman (2014) Very deep convolutional networks for large-scale image recognition. https://arXiv.org/1409.1556

  • Singh DK (2017) Gaussian elliptical fitting based skin color modeling for human detection. In: 2017 IEEE 8th Control and System Graduate Research Colloquium (ICSGRC), pp 197–201. https://doi.org/10.1109/ICSGRC.2017.8070594

  • Tian Z, He T, Shen C, Yan Y (2019) Decoders matter for semantic segmentation: data-dependent decoding enables flexible feature aggregation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, pp 3126–3135. https://doi.org/10.1109/CVPR.2019.00324

  • Tofighi G, Monadjemi SA, Ghasem-Aghaee N (2010) Rapid hand posture recognition using adaptive histogram template of skin and hand edge contour. In: Iranian Conference on Machine Vision and Image Processing. IEEE, Isfahan, Iran, pp 1–5. https://doi.org/10.1109/IranianMVIP.2010.5941173

  • Triesch J, Von Der Malsburg C (2001) A system for person-independent hand posture recognition against complex backgrounds. IEEE Trans Pattern Anal Mach Intell 23(12):1449–1453. https://doi.org/10.1109/34.977568

    Article  Google Scholar 

  • Wang W, Pan J (2012) Hand segmentation using skin color and background information. In: International Conference on Machine Learning and Cybernetics. IEEE, Xian, China, pp 1487–1492. https://doi.org/10.1109/ICMLC.2012.6359584

  • Wang X, Fang Y, Li C, Gong S, Yu L, Fei S (2019) Static gesture segmentation technique based on improved Sobel operator. J Eng 2019(22):8339–8342. https://doi.org/10.1049/joe.2019.1075

    Article  Google Scholar 

  • Wang S, Liu Y, He Z, Wang Y, Tang Z (2020) A quadrilateral scene text detector with two-stage network architecture. Pattern Recogn 102:107230. https://doi.org/10.1016/j.patcog.2020.107230

    Article  Google Scholar 

  • Wei BG, Xu Y, Liu JW, Zhou JM (2020) Adaptive gesture segmentation based on SSD object detection. J Signal Process 36(07):1038–1047

    Google Scholar 

  • Yang M, Yu K, Zhang C, Li Z, Yang K (2018) DenseASPP for semantic segmentation in street scenes. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, pp 3684–3692. https://doi.org/10.1109/CVPR.2018.00388

  • Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. https://arXiv.org/1511.07122

  • Zhang L, van der Maaten L (2013) Structure preserving object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Portland, Oregon, USA, pp 1838–1845

  • Zhang QR (2018) Research on hand gesture segmentation algorithm with complex background. Dissertation, Shandong University

  • Zhang Q, Yang M, Kpalma K, Zheng Q, Zhang X (2018) Segmentation of hand posture against complex backgrounds based on saliency and skin colour detection. IAENG Int J Comput Sci 45(3):435–444

    Google Scholar 

  • Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Honolulu, HI, USA, pp 2881–2890. https://doi.org/10.1109/CVPR.2017.660

  • Zhao H, Qi X, Shen X, Shi J, Jia J (2018) Icnet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European Conference on Computer Vision (ECCV). Springer, Munich, Germany, pp 405–420

  • Zheng Y, Zheng P (2015) Hand segmentation based on improved Gaussian mixture model. In: 2015 International Conference on Computer Science and Applications (CSA). IEEE, Wuhan, China, pp 168–171. https://doi.org/10.1109/CSA.2015.14

  • Zhu H, Miao Y, Zhang X (2020a) Semantic image segmentation with improved position attention and feature fusion. Neural Process Lett 52:329–351. https://doi.org/10.1007/s11063-020-10240-9

    Article  Google Scholar 

  • Zhu H, Wang B, Zhang X, Liu J (2020b) Semantic image segmentation with shared decomposition convolution and boundary reinforcement structure. Appl Intell 2020:1–14. https://doi.org/10.1007/s10489-020-01671-x

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Scientific Research Foundation for Advanced Talents of Hebei University (521100221081) and the Post graduate's Innovation Fund Project of Hebei University (HBU2021ss061).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Qi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cui, Z., Lei, Y., Wang, Y. et al. Hand gesture segmentation against complex background based on improved atrous spatial pyramid pooling. J Ambient Intell Human Comput 14, 11795–11807 (2023). https://doi.org/10.1007/s12652-022-03736-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-022-03736-w

Keywords

Navigation