Skip to main content

Cascading CNNs with S-DQN: A Parameter-Parsimonious Strategy for 3D Hand Pose Estimation

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2023)

Abstract

This paper proposes a cascaded parameter-parsimonious 3D hand pose estimation strategy to improve real-time performance without sacrificing accuracy. The estimation process is first decomposed into feature extraction and feature exploitation. The feature extraction is seen as a dimension reduction process, where convolutional neural networks (CNNs) are used to ensure accuracy. Feature exploitation is considered as a policy optimization process, and a shallow reinforcement learning (RL)-based feature exploitation module is proposed to improve running rapidity. Ablation studies and experiments are carried out on NYU and ICVL datasets to evaluate the performance of the strategy, and multiple baselines are used to evaluate generalization. The results show that the improvement on testing time reaches 8.1\(\%\) and 14.6\(\%\) by the proposed strategy. Note that the overall accuracy also reaches state-of-the-art, which further shows the effectiveness of the proposed strategy.

This work was supported in part by the funding of basic ability promotion project for young and middle-aged teachers in Guangxi’s colleges and universities (Grant No. 2022KY0008), in part by special fund of Guangxi Bagui Scholars, and in part by National Natural Science Foundation of China (Grant No. 61720106009).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, X., Wang, G., Guo, H., Zhang, C.: Pose guided structured region ensemble network for cascaded hand pose estimation. Neurocomputing 395, 138–149 (2020)

    Article  Google Scholar 

  2. Chen, X., Wang, G., Zhang, C., Kim, T.K., Ji, X.: SHPR-Net: deep semantic hand pose regression from point clouds. IEEE Access 6, 43425–43439 (2018)

    Article  Google Scholar 

  3. Chen, Z., Wang, X., Zhou, Y., Zou, L., Jiang, J.: Content-aware cubemap projection for panoramic image via deep Q-learning. In: Proceedings of International Conference on Multimedia Modeling, pp. 304–315 (2020)

    Google Scholar 

  4. Cheng, J., et al.: Efficient virtual view selection for 3D hand pose estimation. arXiv preprint arXiv:2203.15458 (2022)

  5. Cheng, W., Park, J.H., Ko, J.H.: HandFoldingNet: A 3D hand pose estimation network using multiscale-feature guided folding of a 2D hand skeleton. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11260–11269 (2021)

    Google Scholar 

  6. Du, K., Lin, X., Sun, Y., Ma, X.: CrossInfoNet: Multi-task information sharing based hand pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9896–9905 (2019)

    Google Scholar 

  7. Fang, L., Liu, X., Liu, L., Xu, H., Kang, W.: JGR-P2O: joint graph reasoning based pixel-to-offset prediction network for 3D hand pose estimation from a single depth image. In: Proceedings of the European Conference on Computer Vision, pp. 120–137 (2020)

    Google Scholar 

  8. Ge, L., Cai, Y., Weng, J., Yuan, J.: Hand PointNet: 3D hand pose estimation using point sets. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8417–8426 (2018)

    Google Scholar 

  9. Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3D hand pose estimation from single depth images using multi-view CNNs. IEEE Trans. Image Process. 27(9), 4422–4436 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  10. Huang, W., Ren, P., Wang, J., Qi, Q., Sun, H.: AWR: adaptive weighting regression for 3D hand pose estimation. In: Proceedings of AAAI Conference on Artificial Intelligence, vol. 34, pp. 11061–11068 (2020)

    Google Scholar 

  11. Li, H., Chen, J., Hu, R., Yu, M., Chen, H., Xu, Z.: Action recognition using visual attention with reinforcement learning. In: Proceedings of International Conference on Multimedia Modeling, pp. 365–376 (2019)

    Google Scholar 

  12. Li, Z., Zhang, X.: Deep reinforcement learning for automatic thumbnail generation. In: Proceedings of International Conference on Multimedia Modeling, pp. 41–53 (2019)

    Google Scholar 

  13. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  14. Moon, G., Chang, J.Y., Lee, K.M.: V2V-PoseNet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5079–5088 (2018)

    Google Scholar 

  15. Oberweger, M., Lepetit, V.: DeepPrior++: Improving fast and accurate 3D hand pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 585–594 (2017)

    Google Scholar 

  16. Tang, D., Jin Chang, H., Tejani, A., Kim, T.K.: Latent regression forest: Structured estimation of 3D articulated hand posture. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3786–3793 (2014)

    Google Scholar 

  17. Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graphics (ToG) 33(5), 1–10 (2014)

    Article  Google Scholar 

  18. Wan, C., Probst, T., Van Gool, L., Yao, A.: Dense 3D regression for hand pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5147–5156 (2018)

    Google Scholar 

  19. Wang, G., Chen, X., Guo, H., Zhang, C.: Region ensemble network: towards good practices for deep 3D hand pose estimation. J. Vis. Commun. Image Represent. 55, 404–414 (2018)

    Article  Google Scholar 

  20. Wang, Y., Zhang, L., Wang, L., Wang, Z.: Multitask learning for object localization with deep reinforcement learning. IEEE Trans. Cogn. Dev. Syst. 11(4), 573–580 (2019)

    Article  Google Scholar 

  21. Xiong, F., et al.: A2J: Anchor-to-joint regression network for 3D articulated pose estimation from a single depth image. In: Proceedings of IEEE/CVF Conference Computer Vision and Pattern Recognition, pp. 793–802 (2019)

    Google Scholar 

  22. Zeng, C., et al.: Learning compliant grasping and manipulation by teleoperation with adaptive force control. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 717–724 (2021)

    Google Scholar 

  23. Zhang, X., Zhang, F.: Differentiable spatial regression: a novel method for 3D hand pose estimation. IEEE Trans. Multimedia 24, 166–176 (2022)

    Article  Google Scholar 

  24. Zhou, X., Wan, Q., Zhang, W., Xue, X., Wei, Y.: Model-based deep hand pose estimation. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, pp. 2421–2427 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaodong Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, M., Li, S., Shuang, F., Luo, K. (2023). Cascading CNNs with S-DQN: A Parameter-Parsimonious Strategy for 3D Hand Pose Estimation. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13833. Springer, Cham. https://doi.org/10.1007/978-3-031-27077-2_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-27077-2_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-27076-5

  • Online ISBN: 978-3-031-27077-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics