Cascading CNNs with S-DQN: A Parameter-Parsimonious Strategy for 3D Hand Pose Estimation

Chen, Mingqi; Li, Shaodong; Shuang, Feng; Luo, Kai

doi:10.1007/978-3-031-27077-2_28

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13833))

Included in the following conference series:

International Conference on Multimedia Modeling

1889 Accesses

Abstract

This paper proposes a cascaded parameter-parsimonious 3D hand pose estimation strategy to improve real-time performance without sacrificing accuracy. The estimation process is first decomposed into feature extraction and feature exploitation. The feature extraction is seen as a dimension reduction process, where convolutional neural networks (CNNs) are used to ensure accuracy. Feature exploitation is considered as a policy optimization process, and a shallow reinforcement learning (RL)-based feature exploitation module is proposed to improve running rapidity. Ablation studies and experiments are carried out on NYU and ICVL datasets to evaluate the performance of the strategy, and multiple baselines are used to evaluate generalization. The results show that the improvement on testing time reaches 8.1$\%$ and 14.6$\%$ by the proposed strategy. Note that the overall accuracy also reaches state-of-the-art, which further shows the effectiveness of the proposed strategy.

This work was supported in part by the funding of basic ability promotion project for young and middle-aged teachers in Guangxi’s colleges and universities (Grant No. 2022KY0008), in part by special fund of Guangxi Bagui Scholars, and in part by National Natural Science Foundation of China (Grant No. 61720106009).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

3D Human Pose Estimation Using Convolutional Neural Networks with 2D Pose Information

Real-Time Human Pose Estimation via Cascaded Neural Networks Embedded with Multi-task Learning

Structure-Aware 3D Hand Pose Regression from a Single Depth Image

References

Chen, X., Wang, G., Guo, H., Zhang, C.: Pose guided structured region ensemble network for cascaded hand pose estimation. Neurocomputing 395, 138–149 (2020)
Article Google Scholar
Chen, X., Wang, G., Zhang, C., Kim, T.K., Ji, X.: SHPR-Net: deep semantic hand pose regression from point clouds. IEEE Access 6, 43425–43439 (2018)
Article Google Scholar
Chen, Z., Wang, X., Zhou, Y., Zou, L., Jiang, J.: Content-aware cubemap projection for panoramic image via deep Q-learning. In: Proceedings of International Conference on Multimedia Modeling, pp. 304–315 (2020)
Google Scholar
Cheng, J., et al.: Efficient virtual view selection for 3D hand pose estimation. arXiv preprint arXiv:2203.15458 (2022)
Cheng, W., Park, J.H., Ko, J.H.: HandFoldingNet: A 3D hand pose estimation network using multiscale-feature guided folding of a 2D hand skeleton. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11260–11269 (2021)
Google Scholar
Du, K., Lin, X., Sun, Y., Ma, X.: CrossInfoNet: Multi-task information sharing based hand pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9896–9905 (2019)
Google Scholar
Fang, L., Liu, X., Liu, L., Xu, H., Kang, W.: JGR-P2O: joint graph reasoning based pixel-to-offset prediction network for 3D hand pose estimation from a single depth image. In: Proceedings of the European Conference on Computer Vision, pp. 120–137 (2020)
Google Scholar
Ge, L., Cai, Y., Weng, J., Yuan, J.: Hand PointNet: 3D hand pose estimation using point sets. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8417–8426 (2018)
Google Scholar
Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3D hand pose estimation from single depth images using multi-view CNNs. IEEE Trans. Image Process. 27(9), 4422–4436 (2018)
Article MathSciNet MATH Google Scholar
Huang, W., Ren, P., Wang, J., Qi, Q., Sun, H.: AWR: adaptive weighting regression for 3D hand pose estimation. In: Proceedings of AAAI Conference on Artificial Intelligence, vol. 34, pp. 11061–11068 (2020)
Google Scholar
Li, H., Chen, J., Hu, R., Yu, M., Chen, H., Xu, Z.: Action recognition using visual attention with reinforcement learning. In: Proceedings of International Conference on Multimedia Modeling, pp. 365–376 (2019)
Google Scholar
Li, Z., Zhang, X.: Deep reinforcement learning for automatic thumbnail generation. In: Proceedings of International Conference on Multimedia Modeling, pp. 41–53 (2019)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Moon, G., Chang, J.Y., Lee, K.M.: V2V-PoseNet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5079–5088 (2018)
Google Scholar
Oberweger, M., Lepetit, V.: DeepPrior++: Improving fast and accurate 3D hand pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 585–594 (2017)
Google Scholar
Tang, D., Jin Chang, H., Tejani, A., Kim, T.K.: Latent regression forest: Structured estimation of 3D articulated hand posture. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3786–3793 (2014)
Google Scholar
Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graphics (ToG) 33(5), 1–10 (2014)
Article Google Scholar
Wan, C., Probst, T., Van Gool, L., Yao, A.: Dense 3D regression for hand pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5147–5156 (2018)
Google Scholar
Wang, G., Chen, X., Guo, H., Zhang, C.: Region ensemble network: towards good practices for deep 3D hand pose estimation. J. Vis. Commun. Image Represent. 55, 404–414 (2018)
Article Google Scholar
Wang, Y., Zhang, L., Wang, L., Wang, Z.: Multitask learning for object localization with deep reinforcement learning. IEEE Trans. Cogn. Dev. Syst. 11(4), 573–580 (2019)
Article Google Scholar
Xiong, F., et al.: A2J: Anchor-to-joint regression network for 3D articulated pose estimation from a single depth image. In: Proceedings of IEEE/CVF Conference Computer Vision and Pattern Recognition, pp. 793–802 (2019)
Google Scholar
Zeng, C., et al.: Learning compliant grasping and manipulation by teleoperation with adaptive force control. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 717–724 (2021)
Google Scholar
Zhang, X., Zhang, F.: Differentiable spatial regression: a novel method for 3D hand pose estimation. IEEE Trans. Multimedia 24, 166–176 (2022)
Article Google Scholar
Zhou, X., Wan, Q., Zhang, W., Xue, X., Wei, Y.: Model-based deep hand pose estimation. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, pp. 2421–2427 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Guangxi Key Laboratory of Intelligent Control and Maintenance of Power Equipment, School of Electrical Engineering, Guangxi University, Nanning, 530004, China
Mingqi Chen, Shaodong Li, Feng Shuang & Kai Luo

Authors

Mingqi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shaodong Li
View author publications
You can also search for this author in PubMed Google Scholar
Feng Shuang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaodong Li .

Editor information

Editors and Affiliations

University of Bergen, Bergen, Norway
Duc-Tien Dang-Nguyen
Dublin City University, Dublin, Ireland
Cathal Gurrin
Radboud University Nijmegen, Nijmegen, The Netherlands
Martha Larson
Dublin City University, Dublin, Ireland
Alan F. Smeaton
University of Amsterdam, Amsterdam, The Netherlands
Stevan Rudinac
National Institute of Information and Communications Technology, Tokyo, Japan
Minh-Son Dao
Department of Information Science and Media Studies, University of Bergen, Bergen, Norway
Christoph Trattner
La Trobe University, Melbourne, VIC, Australia
Phoebe Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, M., Li, S., Shuang, F., Luo, K. (2023). Cascading CNNs with S-DQN: A Parameter-Parsimonious Strategy for 3D Hand Pose Estimation. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13833. Springer, Cham. https://doi.org/10.1007/978-3-031-27077-2_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-27077-2_28
Published: 29 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27076-5
Online ISBN: 978-3-031-27077-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Cascading CNNs with S-DQN: A Parameter-Parsimonious Strategy for 3D Hand Pose Estimation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

3D Human Pose Estimation Using Convolutional Neural Networks with 2D Pose Information

Real-Time Human Pose Estimation via Cascaded Neural Networks Embedded with Multi-task Learning

Structure-Aware 3D Hand Pose Regression from a Single Depth Image

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Cascading CNNs with S-DQN: A Parameter-Parsimonious Strategy for 3D Hand Pose Estimation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

3D Human Pose Estimation Using Convolutional Neural Networks with 2D Pose Information

Real-Time Human Pose Estimation via Cascaded Neural Networks Embedded with Multi-task Learning

Structure-Aware 3D Hand Pose Regression from a Single Depth Image

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation