Skip to main content

LiteHandNet: A Lightweight Hand Pose Estimation Network via Structural Feature Enhancement

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13833))

Included in the following conference series:

  • 1877 Accesses

Abstract

This paper presents a real-time lightweight network, LiteHandNet, for 2D hand pose estimation from monocular color images. In recent years, keypoint heatmap representation is dominant in pose estimation due to its high accuracy. Nevertheless, keypoint heatmaps require high-resolution representation to extract accurate spatial features, which commonly means high computational costs, e.g., high delay and tremendous model parameters. Therefore, the existing heatmap-based methods are not suitable for the scenes with computation-limited resources and high real-time requirements. We find that high-resolution representation can obtain more clear structural features of a hand, e.g., contours and key regions, which can provide high-quality spatial features to the keypoint heatmap, thus improving the robustness and accuracy of a model. To fully extract the structural features without introducing unnecessary computational costs, we propose a lightweight module, which consists of two parts: a multi-scale feature block (MSFB) and a spatial channel attention block (SCAB). MSFB can extract structural features from hands using multi-scale information, while SCAB can further screen out high-quality structural features and suppress low-quality features. Comprehensive experimental results verify that our model is state-of-the-art in terms of the tradeoff between accuracy, speed, and parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, Y., et al.: Nonparametric structure regularization machine for 2D hand pose estimation. In: WACV, pp. 370–379 (2020)

    Google Scholar 

  2. Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T.S., Zhang, L.: Higherhrnet: scale-aware representation learning for bottom-up human pose estimation. In: CVPR, pp. 5385–5394 (2020)

    Google Scholar 

  3. Gu, K., Yang, L., Yao, A.: Removing the bias of integral pose regression. In: ICCV, pp. 11047–11056 (2021)

    Google Scholar 

  4. Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: Ghostnet: more features from cheap operations. In: CVPR, pp. 1577–1586 (2020)

    Google Scholar 

  5. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  6. Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv (2017)

    Google Scholar 

  7. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR, pp. 7132–7141 (2018)

    Google Scholar 

  8. Ioannou, Y., Robertson, D.P., Cipolla, R., Criminisi, A.: Deep roots: improving CNN efficiency with hierarchical filter groups. In: CVPR, pp. 5977–5986 (2017)

    Google Scholar 

  9. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)

    Google Scholar 

  10. Kong, D., Ma, H., Chen, Y., Xie, X.: Rotation-invariant mixed graphical model network for 2d hand pose estimation. In: WACV, pp. 1535–1544 (2020)

    Google Scholar 

  11. Kong, D., Ma, H., Xie, X.: SIA-GCN: a spatial information aware graph neural network with 2D convolutions for hand pose estimation. In: BMVC (2020)

    Google Scholar 

  12. Li, J., et al.: Human pose regression with residual log-likelihood estimation. In: ICCV, pp. 11005–11014 (2021)

    Google Scholar 

  13. Li, J., Fang, F., Mei, K., Zhang, G.: Multi-scale residual network for image super-resolution. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 527–542. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_32

    Chapter  Google Scholar 

  14. Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: CVPR, pp. 510–519 (2019)

    Google Scholar 

  15. Li, Y., et al.: Is 2D heatmap representation even necessary for human pose estimation? arXiv (2021)

    Google Scholar 

  16. Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 122–138. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_8

    Chapter  Google Scholar 

  17. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29

    Chapter  Google Scholar 

  18. Nibali, A., He, Z., Morgan, S., Prendergast, L.A.: Numerical coordinate regression with convolutional neural networks. arXiv (2018)

    Google Scholar 

  19. Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: CVPR, pp. 4510–4520 (2018)

    Google Scholar 

  20. Santavas, N., Kansizoglou, I., Bampis, L., Karakasis, E.G., Gasteratos, A.: Attention! A lightweight 2D hand pose estimation approach. arXiv (2020)

    Google Scholar 

  21. Simon, T., Joo, H., Matthews, I.A., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: CVPR, pp. 4645–4653 (2017)

    Google Scholar 

  22. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: CVPR, pp. 5693–5703 (2019)

    Google Scholar 

  23. Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: CVPR, pp. 1653–1660 (2014)

    Google Scholar 

  24. Wang, R.J., Li, X., Ao, S., Ling, C.X.: PELEE: a real-time object detection system on mobile devices. In: ICLR (2018)

    Google Scholar 

  25. Wang, Y., Peng, C., Liu, Y.: Mask-pose cascaded CNN for 2D hand pose estimation from single color image. IEEE Trans. Circuits Syst. Video Technol. 29, 3258–3268 (2019)

    Article  Google Scholar 

  26. Wang, Y., Zhang, B., Peng, C.: SrhandNet: real-time 2D hand pose estimation with simultaneous region localization. IEEE Trans. Image Process. 29, 2977–2986 (2020)

    Article  MATH  Google Scholar 

  27. Wang, Z., Nie, X., Qu, X., Chen, Y., Liu, S.: Distribution-aware single-stage models for multi-person 3d pose estimation. arXiv (2022)

    Google Scholar 

  28. Wei, S., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR, pp. 4724–4732 (2016)

    Google Scholar 

  29. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1

    Chapter  Google Scholar 

  30. Yu, C., et al.: Lite-hrNet: a lightweight high-resolution network. In: CVPR, pp. 10440–10450 (2021)

    Google Scholar 

  31. Zhang, F., Zhu, X., Dai, H., Ye, M., Zhu, C.: Distribution-aware coordinate representation for human pose estimation. In: CVPR, pp. 7091–7100 (2020)

    Google Scholar 

  32. Zhang, Z., Tang, J., Wu, G.: Simple and lightweight human pose estimation. arXiv (2019)

    Google Scholar 

  33. Zhao, H., et al.: PSANet: point-wise spatial attention network for scene parsing. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 270–286. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_17

    Chapter  Google Scholar 

  34. Zimmermann, C., Brox, T.: Learning to estimate 3D hand pose from single RGB images. In: ICCV, pp. 4913–4921 (2017)

    Google Scholar 

  35. Zimmermann, C., Ceylan, D., Yang, J., Russell, B.C., Argus, M.J., Brox, T.: Freihand: a dataset for markerless capture of hand pose and shape from single RGB images. In: ICCV, pp. 813–822 (2019)

    Google Scholar 

Download references

Acknowledgements

The research is supported by National Key Research and Development Program of China (2020AAA0109701), National Natural Science Foundation of China (62076024, 62006018).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xu-Cheng Yin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huang, ZY., Chen, SL., Liu, Q., Zhang, CJ., Chen, F., Yin, XC. (2023). LiteHandNet: A Lightweight Hand Pose Estimation Network via Structural Feature Enhancement. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13833. Springer, Cham. https://doi.org/10.1007/978-3-031-27077-2_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-27077-2_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-27076-5

  • Online ISBN: 978-3-031-27077-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics