Abstract
This paper proposes a cascaded parameter-parsimonious 3D hand pose estimation strategy to improve real-time performance without sacrificing accuracy. The estimation process is first decomposed into feature extraction and feature exploitation. The feature extraction is seen as a dimension reduction process, where convolutional neural networks (CNNs) are used to ensure accuracy. Feature exploitation is considered as a policy optimization process, and a shallow reinforcement learning (RL)-based feature exploitation module is proposed to improve running rapidity. Ablation studies and experiments are carried out on NYU and ICVL datasets to evaluate the performance of the strategy, and multiple baselines are used to evaluate generalization. The results show that the improvement on testing time reaches 8.1\(\%\) and 14.6\(\%\) by the proposed strategy. Note that the overall accuracy also reaches state-of-the-art, which further shows the effectiveness of the proposed strategy.
This work was supported in part by the funding of basic ability promotion project for young and middle-aged teachers in Guangxi’s colleges and universities (Grant No. 2022KY0008), in part by special fund of Guangxi Bagui Scholars, and in part by National Natural Science Foundation of China (Grant No. 61720106009).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, X., Wang, G., Guo, H., Zhang, C.: Pose guided structured region ensemble network for cascaded hand pose estimation. Neurocomputing 395, 138–149 (2020)
Chen, X., Wang, G., Zhang, C., Kim, T.K., Ji, X.: SHPR-Net: deep semantic hand pose regression from point clouds. IEEE Access 6, 43425–43439 (2018)
Chen, Z., Wang, X., Zhou, Y., Zou, L., Jiang, J.: Content-aware cubemap projection for panoramic image via deep Q-learning. In: Proceedings of International Conference on Multimedia Modeling, pp. 304–315 (2020)
Cheng, J., et al.: Efficient virtual view selection for 3D hand pose estimation. arXiv preprint arXiv:2203.15458 (2022)
Cheng, W., Park, J.H., Ko, J.H.: HandFoldingNet: A 3D hand pose estimation network using multiscale-feature guided folding of a 2D hand skeleton. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11260–11269 (2021)
Du, K., Lin, X., Sun, Y., Ma, X.: CrossInfoNet: Multi-task information sharing based hand pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9896–9905 (2019)
Fang, L., Liu, X., Liu, L., Xu, H., Kang, W.: JGR-P2O: joint graph reasoning based pixel-to-offset prediction network for 3D hand pose estimation from a single depth image. In: Proceedings of the European Conference on Computer Vision, pp. 120–137 (2020)
Ge, L., Cai, Y., Weng, J., Yuan, J.: Hand PointNet: 3D hand pose estimation using point sets. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8417–8426 (2018)
Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3D hand pose estimation from single depth images using multi-view CNNs. IEEE Trans. Image Process. 27(9), 4422–4436 (2018)
Huang, W., Ren, P., Wang, J., Qi, Q., Sun, H.: AWR: adaptive weighting regression for 3D hand pose estimation. In: Proceedings of AAAI Conference on Artificial Intelligence, vol. 34, pp. 11061–11068 (2020)
Li, H., Chen, J., Hu, R., Yu, M., Chen, H., Xu, Z.: Action recognition using visual attention with reinforcement learning. In: Proceedings of International Conference on Multimedia Modeling, pp. 365–376 (2019)
Li, Z., Zhang, X.: Deep reinforcement learning for automatic thumbnail generation. In: Proceedings of International Conference on Multimedia Modeling, pp. 41–53 (2019)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Moon, G., Chang, J.Y., Lee, K.M.: V2V-PoseNet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5079–5088 (2018)
Oberweger, M., Lepetit, V.: DeepPrior++: Improving fast and accurate 3D hand pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 585–594 (2017)
Tang, D., Jin Chang, H., Tejani, A., Kim, T.K.: Latent regression forest: Structured estimation of 3D articulated hand posture. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3786–3793 (2014)
Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graphics (ToG) 33(5), 1–10 (2014)
Wan, C., Probst, T., Van Gool, L., Yao, A.: Dense 3D regression for hand pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5147–5156 (2018)
Wang, G., Chen, X., Guo, H., Zhang, C.: Region ensemble network: towards good practices for deep 3D hand pose estimation. J. Vis. Commun. Image Represent. 55, 404–414 (2018)
Wang, Y., Zhang, L., Wang, L., Wang, Z.: Multitask learning for object localization with deep reinforcement learning. IEEE Trans. Cogn. Dev. Syst. 11(4), 573–580 (2019)
Xiong, F., et al.: A2J: Anchor-to-joint regression network for 3D articulated pose estimation from a single depth image. In: Proceedings of IEEE/CVF Conference Computer Vision and Pattern Recognition, pp. 793–802 (2019)
Zeng, C., et al.: Learning compliant grasping and manipulation by teleoperation with adaptive force control. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 717–724 (2021)
Zhang, X., Zhang, F.: Differentiable spatial regression: a novel method for 3D hand pose estimation. IEEE Trans. Multimedia 24, 166–176 (2022)
Zhou, X., Wan, Q., Zhang, W., Xue, X., Wei, Y.: Model-based deep hand pose estimation. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, pp. 2421–2427 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, M., Li, S., Shuang, F., Luo, K. (2023). Cascading CNNs with S-DQN: A Parameter-Parsimonious Strategy for 3D Hand Pose Estimation. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13833. Springer, Cham. https://doi.org/10.1007/978-3-031-27077-2_28
Download citation
DOI: https://doi.org/10.1007/978-3-031-27077-2_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27076-5
Online ISBN: 978-3-031-27077-2
eBook Packages: Computer ScienceComputer Science (R0)