Abstract
Hand gesture recognition is an important research field in computer vision. To effectively solve the problem of low hand gesture recognition accuracy, we propose two modules by using atrous convolution in this paper. One is Multi-Scale Fusion (MSF) module. The other is Light-Weight Multi-Scale (LWMS) module. The MSF module can be used for extracting multi-scale features at different receptive fields. The LWMS module can be considered as a kind of enhanced and expanded convolutional operation. Based on the two modules, a Hand Gesture Recognition Approach called HGRA is designed. HGRA is a hand gesture recognition approach which is based on an end-to-end CNN-based framework with two branches. One branch uses the U-Net combined with Multi-Scale Attention module to perform hand gesture segmentation in order to separate hand gestures from complex backgrounds. Then the segmentation result is used for extracting shape features. The other branch extracts visual features, such as appearance and color. The shape and the visual features obtained by the two branches are integrated to perform hand gesture recognition. Experimental results on the OUHANDS and HGR1 gesture datasets show that the proposed method has competitive performance both in hand gesture segmentation and recognition.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Rastgoo, R., Kiani, K., Escalera, S.: Sign language recognition: a deep survey. Expert Syst. Appl. (2021). https://doi.org/10.1016/j.eswa.2020.113794
Cheok, M.J., Omar, Z., Jaward, M.H.: A review of hand gesture and sign language recognition techniques. Int. J. Mach. Learn. Cybern. 10(1), 131–153 (2019). https://doi.org/10.1007/s13042-017-0705-5
Matilainen, M., Sangi, P., Holappa, J., and Silvén, O.: OUHANDS database for hand detection and pose recognition. in 2016 Sixth International Conference on Image Processing Theory, Tools and Applications (IPTA): IEEE, pp. 1–5 (2016). https://doi.org/10.1109/IPTA.2016.7821025
Zhao, S., Yang, W., and Wang, Y.: A new hand segmentation method based on fully convolutional network. in 2018 Chinese Control And Decision Conference (CCDC): IEEE, pp. 5966–5970 (2018). https://doi.org/10.1109/CCDC.2018.8408176
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention: Springer, pp. 234–241 (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chen, L.-C., Papandreou, G., Schroff, F., and Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint https://arxiv.org/abs/1706.05587, (2017)
Xu, K., Chen, M., Xu, Y., and Li, X.: A Gesture Segmentation Method Based on Domain Adaptation and Channel Attention Mechanism. in 2021 International Conference on Communications, Information System and Computer Engineering (CISCE): IEEE, pp. 447–452 (2021). https://doi.org/10.1109/CISCE52179.2021.9445897
Chen, M., Xu, K., and Li, X.: A Hand Gesture Segmentation Method Based on Style Transfer. Computer and Modernization, no. 05, p. 20 (2021)
Wang, X., Girshick, R., Gupta, A., and He, K.: Non-local neural networks. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7794–7803 (2018)
Hu, J., Shen, L., and Sun, G.: Squeeze-and-excitation networks. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745
Ong, E.-J. and Bowden, R.: A boosted classifier tree for hand shape detection. in Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings.: IEEE, pp. 889–894 (2004). https://doi.org/10.1109/AFGR.2004.1301646
Mittal, A., Zisserman, A., and Torr, P. H.: Hand detection using multiple proposals. in Bmvc, vol. 2, no. 3, p. 5 (2011). https://doi.org/10.5244/C.25.75
Alani, A. A., Cosma, G., Taherkhani, A., and McGinnity, T.: Hand gesture recognition using an adapted convolutional neural network with data augmentation. in 2018 4th International conference on information management (ICIM): IEEE, pp. 5–12 (2018). https://doi.org/10.1109/INFOMAN.2018.8392660
Islam, M. Z., Hossain, M. S., ul Islam, R., and Andersson, K.: Static hand gesture recognition using convolutional neural network with data augmentation. in 2019 Joint 8th International Conference on Informatics, Electronics & Vision (ICIEV) and 2019 3rd International Conference on Imaging, Vision & Pattern Recognition (icIVPR): IEEE, pp. 324–329 (2019). https://doi.org/10.1109/ICIEV.2019.8858563
Molchanov, P., Gupta, S., Kim, K., and Kautz, J.: Hand gesture recognition with 3D convolutional neural networks. in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 1–7 (2015). https://doi.org/10.1109/CVPRW.2015.7301342
Dadashzadeh, A., Targhi, A.T., Tahmasbi, M., Mirmehdi, M.: HGR-Net: a fusion network for hand gesture segmentation and recognition. IET Comput. Vis. 13(8), 700–707 (2019). https://doi.org/10.1049/iet-cvi.2018.5796
Zhu, X., Liu, W., Jia, X., and Wong, K.-Y. K.: A two-stage detector for hand detection in ego-centric videos. in 2016 IEEE Winter Conference on Applications of Computer Vision (WACV): IEEE, pp. 1–8 (2016). https://doi.org/10.1109/WACV.2016.7477665
Ren, S., He, K., Girshick, R., and Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv preprint https://arxiv.org/abs/1506.01497, (2015). https://doi.org/10.1109/TPAMI.2016.2577031
He, K., Gkioxari, G., Dollár, P., and Girshick, R.: Mask r-cnn. in Proceedings of the IEEE international conference on computer vision, pp. 2961–2969 (2017). https://doi.org/10.1109/ICCV.2017.322
Bhaumik, G., Verma, M., Govil, M.C., Vipparthi, S.K.: ExtriDeNet: an intensive feature extrication deep network for hand gesture recognition. Vis. Comput. 5, 1–14 (2021). https://doi.org/10.1007/s00371-021-02225-z
Schroff, F., Kalenichenko, D., and Philbin, J.: Facenet: A unified embedding for face recognition and clustering. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823 (2015)
Kawulok, M., Kawulok, J., Nalepa, J., and Papiez, M.: Skin detection using spatial analysis with adaptive seed. in 2013 IEEE International Conference on Image Processing: IEEE, pp. 3720–3724 (2013). https://doi.org/10.1109/ICIP.2013.6738767
Long, J., Shelhamer, E., and Darrell, T.: Fully convolutional networks for semantic segmentation. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440 (2015). https://doi.org/10.1109/CVPR.2015.7298965
Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J.: Pyramid scene parsing network. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881–2890 (2017). https://doi.org/10.1109/CVPR.2017.660
Zunair, H., Hamza, A.B.: Sharp U-Net: depthwise convolutional network for biomedical image segmentation. Comput. Biol. Med. 136, 104699–104699 (2021)
He, K., Zhang, X., Ren, S., and Sun, J.: Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q.: Densely connected convolutional networks. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708 (2017). https://doi.org/10.1109/CVPR.2017.243
Howard, A. et al.: Searching for mobilenetv3. in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)
Tan, M. and Le, Q. V.: Efficientnetv2: Smaller models and faster training. arXiv preprint https://arxiv.org/abs/2104.00298, (2021)
Radosavovic, I., Kosaraju, R. P., Girshick, R., He, K., and Dollár, P.: Designing network design spaces. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10428–10436 (2020)
Ma, N., Zhang, X., Zheng, H.-T., and Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. in Proceedings of the European conference on computer vision (ECCV), pp. 116–131 (2018)
Funding
This work was supported partly by the National Natural Science Foundation of China[grant numbers 61379065]; the Natural Science Foundation of Hebei province in China [grant numbers F2019203285].
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, S., Zhang, S., Zhang, X. et al. A two-branch hand gesture recognition approach combining atrous convolution and attention mechanism. Vis Comput 39, 4487–4500 (2023). https://doi.org/10.1007/s00371-022-02602-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-022-02602-2