Skip to main content
Log in

PEA-YOLO: a lightweight network for static gesture recognition combining multiscale and attention mechanisms

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Gesture recognition has been widely used in many human–computer interaction applications, which is one of the most intuitive and natural ways for humans to communicate with computers. However, it remains a challenging problem due to the interference such as variety of backgrounds, hand similar object, and lighting changes. In this article, a lightweight static gesture recognition network, named as PEA-YOLO, was put forward. The network adopts the idea of adaptive spatial feature pyramid and combines the attention mechanism and multi-path feature fusion method to improve the localization and recognition performance of gesture features. First, Efficient Channel Attention module was added after the backbone network to focus the model’s attention on the gesture. Second, Feature Pyramid Network was replaced by Path Aggregation Network to localize the gesture better. Finally, Adaptive Spatial Feature Fusion module was added before the Yolo head to further reduce false detections rate in gesture recognition. The experiments conducted on the OUHANDS and NUSII datasets show that PEA-YOLO could achieve favorable performance with only 8.57 M parameters in static gesture recognition. Compared with other state of the arts, the proposed lightweight network has obtained a highest accuracy with a much high speed and few parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The data and materials that support the findings of this study are available on request from the authors.

References

  1. Redrovan, D.V., Kim, D.: Hand gestures recognition using machine learning for control of multiple quadrotors. In: 2018 IEEE Sensors Applications Symposium (SAS), pp. 1–6 (2018). https://doi.org/10.1109/SAS.2018.8336782

  2. Zhou, W., Chen, K.: A lightweight hand gesture recognition in complex backgrounds. Displays (2022)

  3. Padam Priyal, S., Bora, P.K.: A robust static hand gesture recognition system using geometry based normalizations and krawtchouk moments. Pattern Recognit. 46(8), 2202–2219 (2013)

    Article  Google Scholar 

  4. Avraam, M.: Static gesture recognition combining graph and appearance features. Int. J. Adv. Res. Artif. Intell. 3(2) (2014)

  5. Wu, C.H., Chen, W.L., Lin, C.H.: Depth-based hand gesture recognition. Multimedia Tools Appl. 75(12), 7065–7086 (2016)

    Article  Google Scholar 

  6. Wu, X.Y.: A hand gesture recognition algorithm based on DC-CNN. Multimedia Tools Appl. 79(13–14), 9193–9205 (2020)

    Article  Google Scholar 

  7. Yadav, K.S., Anish Monsley, K., Laskar, R.H.: Gesture objects detection and tracking for virtual text entry keyboard interface. Multimedia Tools Appl. 82(4), 5317–5342 (2023)

    Article  Google Scholar 

  8. Wang, W., He, M., Wang, X., Ma, J., Song, H.: Medical gesture recognition method based on improved lightweight network. Appl. Sci. 12(13), 6414 (2022)

    Article  Google Scholar 

  9. Diwan, T., Anirudh, G., Tembhurne, J.V.: Object detection using yolo: challenges, architectural successors, datasets and applications. Multimedia Tools Appl. 1–33 (2022)

  10. Yadav, K.S., Laskar, R.H., Ahmad, N., et al.: Exploration of deep learning models for localizing bare-hand in the practical environment. Eng. Appl. Artif. Intell. 123, 106253 (2023)

  11. Sun, S., Han, L., Wei, J., Hao, H., Huang, J., Xin, W., Zhou, X., Kang, P.: Shufflenetv2-yolov3: a real-time recognition method of static sign language based on a lightweight network. Signal Image Video Process. 1–9 (2023)

  12. Lim, J.-S., Astrid, M., Yoon, H.-J., Lee, S.-I.: Small object detection using context and attention. In: 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 181–186. IEEE (2021)

  13. Zhang, Y., Yi, P., Zhou, D., Yang, X., Yang, D., Zhang, Q., Wei, X.: Csanet: channel and spatial mixed attention cnn for pedestrian detection. IEEE Access 8, 76243–76252 (2020). https://doi.org/10.1109/ACCESS.2020.2986476

  14. Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)

  15. Liu, S., Huang, D., Wang, Y.: Learning spatial fusion for single-shot object detection. arXiv preprint arXiv:1911.09516 (2019)

  16. Li, X., Pan, J., Xie, F., Zeng, J., Li, Q., Huang, X., Liu, D., Wang, X.: Fast and accurate green pepper detection in complex backgrounds via an improved Yolov4-tiny model. Comput. Electron. Agric. 191, 106503 (2021)

    Article  Google Scholar 

  17. Matilainen, M., Sangi, P., Holappa, J., Silvén, O.: Ouhands database for hand detection and pose recognition. In: 2016 Sixth International Conference on Image Processing Theory, Tools and Applications (IPTA), pp. 1–5. IEEE (2016)

  18. Pisharady, P.K., Vadakkepat, P., Loh, A.P.: Attention based detection and recognition of hand postures against complex backgrounds. Int. J. Comput. Vis. 101, 403–419 (2013)

    Article  Google Scholar 

Download references

Funding

This research was funded in part by the State Key Laboratory of ASIC & System (2021KF010) and National Natural Science Foundation of China (Grant No. 61404083).

Author information

Authors and Affiliations

Authors

Contributions

WZ contributed to conceptualization, methodology, resources, supervision, writing—review and editing, project administration. XL contributed to methodology, software, validation, formal analysis, investigation, data curation, writing—original draft, and visualization.

Corresponding author

Correspondence to Weina Zhou.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, W., Li, X. PEA-YOLO: a lightweight network for static gesture recognition combining multiscale and attention mechanisms. SIViP 18, 597–605 (2024). https://doi.org/10.1007/s11760-023-02755-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-023-02755-0

Keywords

Navigation