A two-branch hand gesture recognition approach combining atrous convolution and attention mechanism

Wang, Shi; Zhang, Shihui; Zhang, Xiaowei; Geng, Qingjia

doi:10.1007/s00371-022-02602-2

A two-branch hand gesture recognition approach combining atrous convolution and attention mechanism

Original article
Published: 14 July 2022

Volume 39, pages 4487–4500, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Shi Wang^1,3,
Shihui Zhang ORCID: orcid.org/0000-0002-3377-7209^1,2,
Xiaowei Zhang^1,2 &
…
Qingjia Geng³

486 Accesses
1 Altmetric
Explore all metrics

Abstract

Hand gesture recognition is an important research field in computer vision. To effectively solve the problem of low hand gesture recognition accuracy, we propose two modules by using atrous convolution in this paper. One is Multi-Scale Fusion (MSF) module. The other is Light-Weight Multi-Scale (LWMS) module. The MSF module can be used for extracting multi-scale features at different receptive fields. The LWMS module can be considered as a kind of enhanced and expanded convolutional operation. Based on the two modules, a Hand Gesture Recognition Approach called HGRA is designed. HGRA is a hand gesture recognition approach which is based on an end-to-end CNN-based framework with two branches. One branch uses the U-Net combined with Multi-Scale Attention module to perform hand gesture segmentation in order to separate hand gestures from complex backgrounds. Then the segmentation result is used for extracting shape features. The other branch extracts visual features, such as appearance and color. The shape and the visual features obtained by the two branches are integrated to perform hand gesture recognition. Experimental results on the OUHANDS and HGR1 gesture datasets show that the proposed method has competitive performance both in hand gesture segmentation and recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Harmonizing local and global features: enhanced hand gesture segmentation using synergistic fusion of CNN and transformer networks

Article 18 May 2024

HyFiNet: Hybrid feature attention network for hand gesture recognition

Article 08 January 2022

Gesture Recognition Method Based on Sim-ConvNeXt Model

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Rastgoo, R., Kiani, K., Escalera, S.: Sign language recognition: a deep survey. Expert Syst. Appl. (2021). https://doi.org/10.1016/j.eswa.2020.113794
Article Google Scholar
Cheok, M.J., Omar, Z., Jaward, M.H.: A review of hand gesture and sign language recognition techniques. Int. J. Mach. Learn. Cybern. 10(1), 131–153 (2019). https://doi.org/10.1007/s13042-017-0705-5
Article Google Scholar
Matilainen, M., Sangi, P., Holappa, J., and Silvén, O.: OUHANDS database for hand detection and pose recognition. in 2016 Sixth International Conference on Image Processing Theory, Tools and Applications (IPTA): IEEE, pp. 1–5 (2016). https://doi.org/10.1109/IPTA.2016.7821025
HGR1. http://sun.aei.polsl.pl/mkawulok/gestures/
Zhao, S., Yang, W., and Wang, Y.: A new hand segmentation method based on fully convolutional network. in 2018 Chinese Control And Decision Conference (CCDC): IEEE, pp. 5966–5970 (2018). https://doi.org/10.1109/CCDC.2018.8408176
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention: Springer, pp. 234–241 (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chen, L.-C., Papandreou, G., Schroff, F., and Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint https://arxiv.org/abs/1706.05587, (2017)
Xu, K., Chen, M., Xu, Y., and Li, X.: A Gesture Segmentation Method Based on Domain Adaptation and Channel Attention Mechanism. in 2021 International Conference on Communications, Information System and Computer Engineering (CISCE): IEEE, pp. 447–452 (2021). https://doi.org/10.1109/CISCE52179.2021.9445897
Chen, M., Xu, K., and Li, X.: A Hand Gesture Segmentation Method Based on Style Transfer. Computer and Modernization, no. 05, p. 20 (2021)
Wang, X., Girshick, R., Gupta, A., and He, K.: Non-local neural networks. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7794–7803 (2018)
Hu, J., Shen, L., and Sun, G.: Squeeze-and-excitation networks. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745
Ong, E.-J. and Bowden, R.: A boosted classifier tree for hand shape detection. in Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings.: IEEE, pp. 889–894 (2004). https://doi.org/10.1109/AFGR.2004.1301646
Mittal, A., Zisserman, A., and Torr, P. H.: Hand detection using multiple proposals. in Bmvc, vol. 2, no. 3, p. 5 (2011). https://doi.org/10.5244/C.25.75
Alani, A. A., Cosma, G., Taherkhani, A., and McGinnity, T.: Hand gesture recognition using an adapted convolutional neural network with data augmentation. in 2018 4th International conference on information management (ICIM): IEEE, pp. 5–12 (2018). https://doi.org/10.1109/INFOMAN.2018.8392660
Islam, M. Z., Hossain, M. S., ul Islam, R., and Andersson, K.: Static hand gesture recognition using convolutional neural network with data augmentation. in 2019 Joint 8th International Conference on Informatics, Electronics & Vision (ICIEV) and 2019 3rd International Conference on Imaging, Vision & Pattern Recognition (icIVPR): IEEE, pp. 324–329 (2019). https://doi.org/10.1109/ICIEV.2019.8858563
Molchanov, P., Gupta, S., Kim, K., and Kautz, J.: Hand gesture recognition with 3D convolutional neural networks. in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 1–7 (2015). https://doi.org/10.1109/CVPRW.2015.7301342
Dadashzadeh, A., Targhi, A.T., Tahmasbi, M., Mirmehdi, M.: HGR-Net: a fusion network for hand gesture segmentation and recognition. IET Comput. Vis. 13(8), 700–707 (2019). https://doi.org/10.1049/iet-cvi.2018.5796
Article Google Scholar
Zhu, X., Liu, W., Jia, X., and Wong, K.-Y. K.: A two-stage detector for hand detection in ego-centric videos. in 2016 IEEE Winter Conference on Applications of Computer Vision (WACV): IEEE, pp. 1–8 (2016). https://doi.org/10.1109/WACV.2016.7477665
Ren, S., He, K., Girshick, R., and Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv preprint https://arxiv.org/abs/1506.01497, (2015). https://doi.org/10.1109/TPAMI.2016.2577031
He, K., Gkioxari, G., Dollár, P., and Girshick, R.: Mask r-cnn. in Proceedings of the IEEE international conference on computer vision, pp. 2961–2969 (2017). https://doi.org/10.1109/ICCV.2017.322
Bhaumik, G., Verma, M., Govil, M.C., Vipparthi, S.K.: ExtriDeNet: an intensive feature extrication deep network for hand gesture recognition. Vis. Comput. 5, 1–14 (2021). https://doi.org/10.1007/s00371-021-02225-z
Article Google Scholar
Schroff, F., Kalenichenko, D., and Philbin, J.: Facenet: A unified embedding for face recognition and clustering. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823 (2015)
Kawulok, M., Kawulok, J., Nalepa, J., and Papiez, M.: Skin detection using spatial analysis with adaptive seed. in 2013 IEEE International Conference on Image Processing: IEEE, pp. 3720–3724 (2013). https://doi.org/10.1109/ICIP.2013.6738767
Long, J., Shelhamer, E., and Darrell, T.: Fully convolutional networks for semantic segmentation. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440 (2015). https://doi.org/10.1109/CVPR.2015.7298965
Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J.: Pyramid scene parsing network. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881–2890 (2017). https://doi.org/10.1109/CVPR.2017.660
Zunair, H., Hamza, A.B.: Sharp U-Net: depthwise convolutional network for biomedical image segmentation. Comput. Biol. Med. 136, 104699–104699 (2021)
Article Google Scholar
He, K., Zhang, X., Ren, S., and Sun, J.: Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q.: Densely connected convolutional networks. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708 (2017). https://doi.org/10.1109/CVPR.2017.243
Howard, A. et al.: Searching for mobilenetv3. in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)
Tan, M. and Le, Q. V.: Efficientnetv2: Smaller models and faster training. arXiv preprint https://arxiv.org/abs/2104.00298, (2021)
Radosavovic, I., Kosaraju, R. P., Girshick, R., He, K., and Dollár, P.: Designing network design spaces. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10428–10436 (2020)
Ma, N., Zhang, X., Zheng, H.-T., and Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. in Proceedings of the European conference on computer vision (ECCV), pp. 116–131 (2018)

Download references

Funding

This work was supported partly by the National Natural Science Foundation of China[grant numbers 61379065]; the Natural Science Foundation of Hebei province in China [grant numbers F2019203285].

Author information

Authors and Affiliations

School of Information Science and Engineering, Yanshan University, Qinhuangdao, 066004, China
Shi Wang, Shihui Zhang & Xiaowei Zhang
Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Qinhuangdao, 066004, China
Shihui Zhang & Xiaowei Zhang
Hebei Normal University of Science & Technology, Qinhuangdao, 066004, China
Shi Wang & Qingjia Geng

Authors

Shi Wang
View author publications
You can also search for this author inPubMed Google Scholar
Shihui Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Xiaowei Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Qingjia Geng
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Shi Wang or Shihui Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, S., Zhang, S., Zhang, X. et al. A two-branch hand gesture recognition approach combining atrous convolution and attention mechanism. Vis Comput 39, 4487–4500 (2023). https://doi.org/10.1007/s00371-022-02602-2

Download citation

Accepted: 17 June 2022
Published: 14 July 2022
Issue Date: October 2023
DOI: https://doi.org/10.1007/s00371-022-02602-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A two-branch hand gesture recognition approach combining atrous convolution and attention mechanism

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Harmonizing local and global features: enhanced hand gesture segmentation using synergistic fusion of CNN and transformer networks

HyFiNet: Hybrid feature attention network for hand gesture recognition

Gesture Recognition Method Based on Sim-ConvNeXt Model

Explore related subjects

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now