LiteHandNet: A Lightweight Hand Pose Estimation Network via Structural Feature Enhancement

Huang, Zhi-Yong; Chen, Song-Lu; Liu, Qi; Zhang, Chong-Jian; Chen, Feng; Yin, Xu-Cheng

doi:10.1007/978-3-031-27077-2_25

Zhi-Yong Huang^15,16,
Song-Lu Chen^15,16,
Qi Liu^15,16,
Chong-Jian Zhang^15,16,
Feng Chen^16,17 &
…
Xu-Cheng Yin^15,16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13833))

Included in the following conference series:

International Conference on Multimedia Modeling

1877 Accesses

Abstract

This paper presents a real-time lightweight network, LiteHandNet, for 2D hand pose estimation from monocular color images. In recent years, keypoint heatmap representation is dominant in pose estimation due to its high accuracy. Nevertheless, keypoint heatmaps require high-resolution representation to extract accurate spatial features, which commonly means high computational costs, e.g., high delay and tremendous model parameters. Therefore, the existing heatmap-based methods are not suitable for the scenes with computation-limited resources and high real-time requirements. We find that high-resolution representation can obtain more clear structural features of a hand, e.g., contours and key regions, which can provide high-quality spatial features to the keypoint heatmap, thus improving the robustness and accuracy of a model. To fully extract the structural features without introducing unnecessary computational costs, we propose a lightweight module, which consists of two parts: a multi-scale feature block (MSFB) and a spatial channel attention block (SCAB). MSFB can extract structural features from hands using multi-scale information, while SCAB can further screen out high-quality structural features and suppress low-quality features. Comprehensive experimental results verify that our model is state-of-the-art in terms of the tradeoff between accuracy, speed, and parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Keypoint-based contextual representations for hand pose estimation

Article 14 September 2023

Enhancing multi-scale information exchange and feature fusion for human pose estimation

Article 18 August 2022

LGCANet: lightweight hand pose estimation network based on HRNet

Article 26 May 2024

References

Chen, Y., et al.: Nonparametric structure regularization machine for 2D hand pose estimation. In: WACV, pp. 370–379 (2020)
Google Scholar
Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T.S., Zhang, L.: Higherhrnet: scale-aware representation learning for bottom-up human pose estimation. In: CVPR, pp. 5385–5394 (2020)
Google Scholar
Gu, K., Yang, L., Yao, A.: Removing the bias of integral pose regression. In: ICCV, pp. 11047–11056 (2021)
Google Scholar
Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: Ghostnet: more features from cheap operations. In: CVPR, pp. 1577–1586 (2020)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv (2017)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR, pp. 7132–7141 (2018)
Google Scholar
Ioannou, Y., Robertson, D.P., Cipolla, R., Criminisi, A.: Deep roots: improving CNN efficiency with hierarchical filter groups. In: CVPR, pp. 5977–5986 (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Kong, D., Ma, H., Chen, Y., Xie, X.: Rotation-invariant mixed graphical model network for 2d hand pose estimation. In: WACV, pp. 1535–1544 (2020)
Google Scholar
Kong, D., Ma, H., Xie, X.: SIA-GCN: a spatial information aware graph neural network with 2D convolutions for hand pose estimation. In: BMVC (2020)
Google Scholar
Li, J., et al.: Human pose regression with residual log-likelihood estimation. In: ICCV, pp. 11005–11014 (2021)
Google Scholar
Li, J., Fang, F., Mei, K., Zhang, G.: Multi-scale residual network for image super-resolution. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 527–542. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_32
Chapter Google Scholar
Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: CVPR, pp. 510–519 (2019)
Google Scholar
Li, Y., et al.: Is 2D heatmap representation even necessary for human pose estimation? arXiv (2021)
Google Scholar
Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 122–138. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_8
Chapter Google Scholar
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Chapter Google Scholar
Nibali, A., He, Z., Morgan, S., Prendergast, L.A.: Numerical coordinate regression with convolutional neural networks. arXiv (2018)
Google Scholar
Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: CVPR, pp. 4510–4520 (2018)
Google Scholar
Santavas, N., Kansizoglou, I., Bampis, L., Karakasis, E.G., Gasteratos, A.: Attention! A lightweight 2D hand pose estimation approach. arXiv (2020)
Google Scholar
Simon, T., Joo, H., Matthews, I.A., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: CVPR, pp. 4645–4653 (2017)
Google Scholar
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: CVPR, pp. 5693–5703 (2019)
Google Scholar
Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: CVPR, pp. 1653–1660 (2014)
Google Scholar
Wang, R.J., Li, X., Ao, S., Ling, C.X.: PELEE: a real-time object detection system on mobile devices. In: ICLR (2018)
Google Scholar
Wang, Y., Peng, C., Liu, Y.: Mask-pose cascaded CNN for 2D hand pose estimation from single color image. IEEE Trans. Circuits Syst. Video Technol. 29, 3258–3268 (2019)
Article Google Scholar
Wang, Y., Zhang, B., Peng, C.: SrhandNet: real-time 2D hand pose estimation with simultaneous region localization. IEEE Trans. Image Process. 29, 2977–2986 (2020)
Article MATH Google Scholar
Wang, Z., Nie, X., Qu, X., Chen, Y., Liu, S.: Distribution-aware single-stage models for multi-person 3d pose estimation. arXiv (2022)
Google Scholar
Wei, S., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR, pp. 4724–4732 (2016)
Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar
Yu, C., et al.: Lite-hrNet: a lightweight high-resolution network. In: CVPR, pp. 10440–10450 (2021)
Google Scholar
Zhang, F., Zhu, X., Dai, H., Ye, M., Zhu, C.: Distribution-aware coordinate representation for human pose estimation. In: CVPR, pp. 7091–7100 (2020)
Google Scholar
Zhang, Z., Tang, J., Wu, G.: Simple and lightweight human pose estimation. arXiv (2019)
Google Scholar
Zhao, H., et al.: PSANet: point-wise spatial attention network for scene parsing. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 270–286. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_17
Chapter Google Scholar
Zimmermann, C., Brox, T.: Learning to estimate 3D hand pose from single RGB images. In: ICCV, pp. 4913–4921 (2017)
Google Scholar
Zimmermann, C., Ceylan, D., Yang, J., Russell, B.C., Argus, M.J., Brox, T.: Freihand: a dataset for markerless capture of hand pose and shape from single RGB images. In: ICCV, pp. 813–822 (2019)
Google Scholar

Download references

Acknowledgements

The research is supported by National Key Research and Development Program of China (2020AAA0109701), National Natural Science Foundation of China (62076024, 62006018).

Author information

Authors and Affiliations

University of Science and Technology Beijing, Beijing, 100083, China
Zhi-Yong Huang, Song-Lu Chen, Qi Liu, Chong-Jian Zhang & Xu-Cheng Yin
USTB-EEasyTech Joint Lab of Artificial Intelligence, Beijing, 100083, China
Zhi-Yong Huang, Song-Lu Chen, Qi Liu, Chong-Jian Zhang, Feng Chen & Xu-Cheng Yin
EEasy Technology Company Ltd., Zhuhai, 519000, China
Feng Chen

Authors

Zhi-Yong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Song-Lu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chong-Jian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Feng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xu-Cheng Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xu-Cheng Yin .

Editor information

Editors and Affiliations

University of Bergen, Bergen, Norway
Duc-Tien Dang-Nguyen
Dublin City University, Dublin, Ireland
Cathal Gurrin
Radboud University Nijmegen, Nijmegen, The Netherlands
Martha Larson
Dublin City University, Dublin, Ireland
Alan F. Smeaton
University of Amsterdam, Amsterdam, The Netherlands
Stevan Rudinac
National Institute of Information and Communications Technology, Tokyo, Japan
Minh-Son Dao
Department of Information Science and Media Studies, University of Bergen, Bergen, Norway
Christoph Trattner
La Trobe University, Melbourne, VIC, Australia
Phoebe Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, ZY., Chen, SL., Liu, Q., Zhang, CJ., Chen, F., Yin, XC. (2023). LiteHandNet: A Lightweight Hand Pose Estimation Network via Structural Feature Enhancement. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13833. Springer, Cham. https://doi.org/10.1007/978-3-031-27077-2_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-27077-2_25
Published: 29 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27076-5
Online ISBN: 978-3-031-27077-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

LiteHandNet: A Lightweight Hand Pose Estimation Network via Structural Feature Enhancement

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Keypoint-based contextual representations for hand pose estimation

Enhancing multi-scale information exchange and feature fusion for human pose estimation

LGCANet: lightweight hand pose estimation network based on HRNet

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

LiteHandNet: A Lightweight Hand Pose Estimation Network via Structural Feature Enhancement

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Keypoint-based contextual representations for hand pose estimation

Enhancing multi-scale information exchange and feature fusion for human pose estimation

LGCANet: lightweight hand pose estimation network based on HRNet

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation