Abstract
Gesture segmentation is an essential part of gesture detection. The accuracy of gesture detection can be improved by using gesture segmentation to remove the background part un-hand images. However, the inaccurate features of current methods can greatly affect the accuracy of results in segmentation and gesture recognition. In order to solve this problem and obtain accurate features, this paper proposes the improved atrous spatial pyramid pooling (IASPP). IASPP is a pooling layer in convolution neural network, which can refine features by connecting cascade model and parallel model in atrous spatial pyramid pooling. Otherwise, in order to improve the segmentation performance by integrating details and spatial location information at different levels, the IASPP is embedded in the encoder-decoder, and we name the method the improved atrous spatial pyramid pooling-ResNet (IASPP-ResNet) for gesture segmentation. In the experiment part of this paper, we test the proposed method by comparing it with the states of art on the two datasets of OUTHANDS and HGR. It can be seen that IASPP-ResNet can achieve 97.75% Pixel Accuracy and 89.60% MIoU on the OUTHANDS dataset. The Pixel Accuracy and MIoU of the presented method on the HGR dataset can reach 99.09% and 97.52%, respectively. These presented that our method is superior to the states of art.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Chen D, Li G, Sun Y et al (2017a) Fusion hand gesture segmentation and extraction based on CMOS sensor and 3D sensor. Int J Wirel Mob Comput 12(3):305–312. https://doi.org/10.1504/IJWMC.2017.084818
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017b) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184
Chen L C, Papandreou G, Schroff F, Adam H (2017c) Rethinking atrous convolution for semantic image segmentation. https://arXiv.org/1706.05587
Chen L C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision. Springer, Munich, Germany, pp 801–818. https://doi.org/10.1007/978-3-030-01234-2_49
Cheng MM, Mitra NJ, Huang X, Torr PH, Hu SM (2014) Global contrast based salient region detection. IEEE Trans Pattern Anal Mach Intell 37(3):569–582. https://doi.org/10.1109/TPAMI.2014.2345401
Conaire CO, O'Connor NE, Smeaton AF (2007) Detector adaptation by maximising agreement between independent data sources. In: Conference on Computer Vision and Pattern Recognition. IEEE, Minneapolis, MN, USA, pp 1–6. https://doi.org/10.1109/CVPR.2007.383448
Dadashzadeh A, Targhi AT, Tahmasbi M, Mirmehdi M (2019) HGR-Net: a fusion network for hand gesture segmentation and recognition. IET Comput vis 13(8):700–707. https://doi.org/10.1049/iet-cvi.2018.5796
Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Garcia-Rodriguez J (2017) A review on deep learning techniques applied to semantic segmentation. https://arXiv.org/1704.06857
Guo Y, Liu Y, Georgiou T, Lew MS (2018) A review of semantic segmentation using deep neural networks. Int J Multimed Inform Retr 7(2):87–93. https://doi.org/10.1007/s13735-017-0141-z
Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with Microsoft kinect sensor: a review. IEEE Trans Cybern 43(5):1318–1334. https://doi.org/10.1109/TCYB.2013.2265378
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Las Vegas, USA, pp 770–778. https://doi.org/10.1109/cvpr.2016.90
He A, Luo C, Tian X, Zeng W (2018) A twofold siamese network for real-time object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake, Utah, USA, pp 4834–4843
Hou QB, Zhang L, Cheng MM, Feng J (2020) Strip pooling: rethinking spatial pooling for scene parsing. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, pp 4003–4012. https://doi.org/10.1109/CVPR42600.2020.00406
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Honolulu, Hawaii, pp 4700–4708
Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: Proceedings of the European Conference on Computer Vision. Springer, Munich, Germany, pp 784–799
Jones MJ, Rehg JM (2002) Statistical color models with application to skin detection. Int J Comput vis 46(1):81–96. https://doi.org/10.1023/A:1013200319198
Kawulok M, Kawulok J, Smolka B (2012) Discriminative textural features for image and video colorization. IEICE Trans Inf Syst 95(7):1722–1730. https://doi.org/10.1587/transinf.E95.D.1722
Kawulok M, Kawulok J, Nalepa J (2014) Spatial-based skin detection using discriminative skin-presence features. Pattern Recogn Lett 41:3–13. https://doi.org/10.1016/j.patrec.2013.08.028
Lin TY, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Honolulu, Hawaii, pp 2117–2125. https://doi.org/10.1109/CVPR.2017.106
Liu C, Wang J, Zhang T, Ding D (2016) Adaptive threshold gesture segmentation algorithm based on skin color. In: Proceedings of 2016 2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016), pp 1602–1605. https://doi.org/10.2991/ameii-16.2016.301
Liu J, Wang X, Tai X C (2020) Deep convolutional neural networks with spatial regularization, volume and star-shape priori for image segmentation. https://arXiv.org/2002.03989
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Boston, MA, USA, pp 3431–3440. https://doi.org/10.1109/CVPR.2015.7298965
Luo HL, Zhang Y (2019) A survey of image semantic segmentation based on deep network. Acta Electron Sin 47(10):2211–2220
Matilainen M, Sangi P, Holappa J, Silvén O (2016) OUHANDS database for hand detection and pose recognition. In: Sixth International Conference on Image Processing Theory, Tools and Applications (IPTA). IEEE, Oulu, Finland, pp 1–5. https://doi.org/10.1109/IPTA.2016.7821025
Sayed U, Mofaddel M, Bakheet S, El-Zohry Z (2018) An elliptical boundary skin model for hand detection based on HSV color space. Inform Sci Lett 7(1):13–17. https://doi.org/10.18576/isl/070103
Simonyan K, Zisserman (2014) Very deep convolutional networks for large-scale image recognition. https://arXiv.org/1409.1556
Singh DK (2017) Gaussian elliptical fitting based skin color modeling for human detection. In: 2017 IEEE 8th Control and System Graduate Research Colloquium (ICSGRC), pp 197–201. https://doi.org/10.1109/ICSGRC.2017.8070594
Tian Z, He T, Shen C, Yan Y (2019) Decoders matter for semantic segmentation: data-dependent decoding enables flexible feature aggregation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, pp 3126–3135. https://doi.org/10.1109/CVPR.2019.00324
Tofighi G, Monadjemi SA, Ghasem-Aghaee N (2010) Rapid hand posture recognition using adaptive histogram template of skin and hand edge contour. In: Iranian Conference on Machine Vision and Image Processing. IEEE, Isfahan, Iran, pp 1–5. https://doi.org/10.1109/IranianMVIP.2010.5941173
Triesch J, Von Der Malsburg C (2001) A system for person-independent hand posture recognition against complex backgrounds. IEEE Trans Pattern Anal Mach Intell 23(12):1449–1453. https://doi.org/10.1109/34.977568
Wang W, Pan J (2012) Hand segmentation using skin color and background information. In: International Conference on Machine Learning and Cybernetics. IEEE, Xian, China, pp 1487–1492. https://doi.org/10.1109/ICMLC.2012.6359584
Wang X, Fang Y, Li C, Gong S, Yu L, Fei S (2019) Static gesture segmentation technique based on improved Sobel operator. J Eng 2019(22):8339–8342. https://doi.org/10.1049/joe.2019.1075
Wang S, Liu Y, He Z, Wang Y, Tang Z (2020) A quadrilateral scene text detector with two-stage network architecture. Pattern Recogn 102:107230. https://doi.org/10.1016/j.patcog.2020.107230
Wei BG, Xu Y, Liu JW, Zhou JM (2020) Adaptive gesture segmentation based on SSD object detection. J Signal Process 36(07):1038–1047
Yang M, Yu K, Zhang C, Li Z, Yang K (2018) DenseASPP for semantic segmentation in street scenes. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, pp 3684–3692. https://doi.org/10.1109/CVPR.2018.00388
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. https://arXiv.org/1511.07122
Zhang L, van der Maaten L (2013) Structure preserving object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Portland, Oregon, USA, pp 1838–1845
Zhang QR (2018) Research on hand gesture segmentation algorithm with complex background. Dissertation, Shandong University
Zhang Q, Yang M, Kpalma K, Zheng Q, Zhang X (2018) Segmentation of hand posture against complex backgrounds based on saliency and skin colour detection. IAENG Int J Comput Sci 45(3):435–444
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Honolulu, HI, USA, pp 2881–2890. https://doi.org/10.1109/CVPR.2017.660
Zhao H, Qi X, Shen X, Shi J, Jia J (2018) Icnet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European Conference on Computer Vision (ECCV). Springer, Munich, Germany, pp 405–420
Zheng Y, Zheng P (2015) Hand segmentation based on improved Gaussian mixture model. In: 2015 International Conference on Computer Science and Applications (CSA). IEEE, Wuhan, China, pp 168–171. https://doi.org/10.1109/CSA.2015.14
Zhu H, Miao Y, Zhang X (2020a) Semantic image segmentation with improved position attention and feature fusion. Neural Process Lett 52:329–351. https://doi.org/10.1007/s11063-020-10240-9
Zhu H, Wang B, Zhang X, Liu J (2020b) Semantic image segmentation with shared decomposition convolution and boundary reinforcement structure. Appl Intell 2020:1–14. https://doi.org/10.1007/s10489-020-01671-x
Acknowledgements
This work was supported by the Scientific Research Foundation for Advanced Talents of Hebei University (521100221081) and the Post graduate's Innovation Fund Project of Hebei University (HBU2021ss061).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cui, Z., Lei, Y., Wang, Y. et al. Hand gesture segmentation against complex background based on improved atrous spatial pyramid pooling. J Ambient Intell Human Comput 14, 11795–11807 (2023). https://doi.org/10.1007/s12652-022-03736-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-022-03736-w