skip to main content
10.1145/3591106.3592215acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

ASCS-Reinforcement Learning: A Cascaded Framework for Accurate 3D Hand Pose Estimation

Published: 12 June 2023 Publication History

Abstract

3D hand pose estimation can be achieved by cascading a feature extraction module and a feature exploitation module, where reinforcement learning (RL) is proved to be an effective way to perform feature exploitation. This paper points out the prospects of improving accuracy using better exploitation strategy, and proposes an Adaptive Step-Critic Shared RL (ASCS-RL) strategy for accurate feature exploitation in 3D hand pose estimation. Hand joint features are exploited in a multi-task manner, and divided into two groups according to the distributions of estimation error. An RL-based adaptive-step (AS-RL) strategy is then used to obtain the optimal step size for better exploitation. The exploitation process are finally performed using a critic-shared RL (CS-RL) strategy, where both groups share a universal critic mechanism. Ablation studies and extensive experiments are carried out to evaluate the performance of ASCS-RL on ICVL and NYU datasets. The results show the strategy achieves the state-of-the-art accuracy in monocular depth-based 3D hand pose estimation, especially the best on ICVL. Experiments also validates that ASCS-RL realizes better tradeoff between accuracy and running rapidity.

Supplemental Material

MP4 File
Supplementary video of 3D hand pose estimation via the proposed ASCS-RL strategy.

References

[1]
Mingqi Chen, Shaodong Li, Feng Shuang, and Kai Luo. 2023. Cascading CNNs with S-DQN: A Parameter-Parsimonious Strategy for 3D Hand Pose Estimation. In MultiMedia Modeling: 29th International Conference, MMM 2023, Bergen, Norway, January 9–12, 2023, Proceedings, Part I. 358–369.
[2]
Xinghao Chen, Guijin Wang, Hengkai Guo, and Cairong Zhang. 2020. Pose guided structured region ensemble network for cascaded hand pose estimation. Neurocomputing 395 (2020), 138–149.
[3]
Xinghao Chen, Guijin Wang, Cairong Zhang, Tae-Kyun Kim, and Xiangyang Ji. 2018. SHPR-Net: Deep semantic hand pose regression from point clouds. IEEE Access 6 (2018), 43425–43439.
[4]
Jian Cheng, Yanguang Wan, Dexin Zuo, Cuixia Ma, Jian Gu, Ping Tan, Hongan Wang, Xiaoming Deng, and Yinda Zhang. 2022. Efficient Virtual View Selection for 3D Hand Pose Estimation. arXiv preprint arXiv:2203.15458 (2022).
[5]
Wencan Cheng, Jae Hyun Park, and Jong Hwan Ko. 2021. HandFoldingNet: A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11260–11269.
[6]
Kuo Du, Xiangbo Lin, Yi Sun, and Xiaohong Ma. 2019. Crossinfonet: Multi-task information sharing based hand pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9896–9905.
[7]
Linpu Fang, Xingyan Liu, Li Liu, Hang Xu, and Wenxiong Kang. 2020. JGR-P2O: Joint graph reasoning based pixel-to-offset prediction network for 3D hand pose estimation from a single depth image. In Proceedings of the European Conference on Computer Vision (ECCV). 120–137.
[8]
Shachar Fleishman, Mark Kliger, Alon Lerner, and Gershom Kutliroff. 2015. ICPIK: Inverse kinematics based articulated-ICP. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 28–35.
[9]
Erik Gärtner, Aleksis Pirinen, and Cristian Sminchisescu. 2020. Deep reinforcement learning for active human pose estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 10835–10844.
[10]
Liuhao Ge, Yujun Cai, Junwu Weng, and Junsong Yuan. 2018. Hand PointNet: 3D hand pose estimation using point sets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8417–8426.
[11]
Liuhao Ge, Hui Liang, Junsong Yuan, and Daniel Thalmann. 2017. 3D convolutional neural networks for efficient and robust hand pose estimation from single depth images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1991–2000.
[12]
Weiting Huang, Pengfei Ren, Jingyu Wang, Qi Qi, and Haifeng Sun. 2020. AWR: Adaptive weighting regression for 3D hand pose estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11061–11068.
[13]
Alexander Krull, Eric Brachmann, Sebastian Nowozin, Frank Michel, Jamie Shotton, and Carsten Rother. 2017. PoseAgent: Budget-constrained 6D object pose estimation via reinforcement learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6702–6710.
[14]
Debang Li, Huikai Wu, Junge Zhang, and Kaiqi Huang. 2018. A2-RL: Aesthetics aware reinforcement learning for image cropping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8193–8201.
[15]
Gyeongsik Moon, Ju Yong Chang, and Kyoung Mu Lee. 2018. V2V-PoseNet: Voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5079–5088.
[16]
Wei-Zhi Nie, Wen-Wu Jia, Wen-Hui Li, An-An Liu, and Si-Cheng Zhao. 2020. 3D pose estimation based on reinforce learning for 2D image-based 3Dmodel retrieval. IEEE Transactions on Multimedia 23 (2020), 1021–1034.
[17]
Markus Oberweger and Vincent Lepetit. 2017. Deepprior++: Improving fast and accurate 3d hand pose estimation. In Proceedings of the IEEE international conference on computer vision Workshops. 585–594.
[18]
Markus Oberweger, Paul Wohlhart, and Vincent Lepetit. 2019. Generalized feedback loop for joint hand-object pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 8 (2019), 1898–1912.
[19]
Pengfei Ren, Haifeng Sun, Jiachang Hao, Qi Qi, Jingyu Wang, and Jianxin Liao. 2021. Pose-Guided Hierarchical Graph Reasoning for 3-D Hand Pose Estimation From a Single Depth Image. IEEE Transactions on Cybernetics (2021).
[20]
Jianzhun Shao, Yuhang Jiang, Gu Wang, Zhigang Li, and Xiangyang Ji. 2020. PFRL: Pose-free reinforcement learning for 6D pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11454–11463.
[21]
Juil Sock, Guillermo Garcia-Hernando, and Tae-Kyun Kim. 2020. Active 6D multi-object pose estimation in cluttered scenarios with deep reinforcement learning. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 10564–10571.
[22]
Danhang Tang, Hyung Jin Chang, Alykhan Tejani, and Tae-Kyun Kim. 2014. Latent regression forest: Structured estimation of 3D articulated hand posture. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3786–3793.
[23]
Jonathan Tompson, Murphy Stein, Yann Lecun, and Ken Perlin. 2014. Real-time continuous pose recovery of human hands using convolutional networks. ACM Transactions on Graphics (ToG) 33, 5 (2014), 1–10.
[24]
Chengde Wan, Thomas Probst, Luc Van Gool, and Angela Yao. 2018. Dense 3D regression for hand pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5147–5156.
[25]
Guijin Wang, Xinghao Chen, Hengkai Guo, and Cairong Zhang. 2018. Region ensemble network: Towards good practices for deep 3D hand pose estimation. Journal of Visual Communication and Image Representation 55 (2018), 404–414.
[26]
Yan Wang, Lei Zhang, Lituan Wang, and Zizhou Wang. 2018. Multitask learning for object localization with deep reinforcement learning. IEEE Transactions on Cognitive and Developmental Systems 11, 4 (2018), 573–580.
[27]
Fu Xiong, Boshen Zhang, Yang Xiao, Zhiguo Cao, Taidong Yu, Joey Tianyi Zhou, and Junsong Yuan. 2019. A2J: Anchor-to-joint regression network for 3D articulated pose estimation from a single depth image. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 793–802.
[28]
Qi Ye, Shanxin Yuan, and Tae-Kyun Kim. 2016. Spatial attention deep net with partial PSO for hierarchical hybrid hand pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV). 346–361.
[29]
Shanxin Yuan, Guillermo Garcia-Hernando, Björn Stenger, Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee, Pavlo Molchanov, Jan Kautz, Sina Honari, Liuhao Ge, 2018. Depth-based 3d hand pose estimation: From current achievements to future goals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2636–2645.
[30]
Chao Zeng 2021. Learning compliant grasping and manipulation by teleoperation with adaptive force control. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 717–724.
[31]
Xiaoyan Zhang, Zhuopeng Li, and Jianmin Jiang. 2020. Emotion attention-aware collaborative deep reinforcement learning for image cropping. IEEE Transactions on Multimedia 23 (2020), 2545–2560.
[32]
Xiang Zhang, Liting Sun, Zhian Kuang, and Masayoshi Tomizuka. 2021. Learning variable impedance control via inverse reinforcement learning for force-related tasks. IEEE Robotics and Automation Letters 6, 2 (2021), 2225–2232.
[33]
Xingyuan Zhang and Fuhai Zhang. 2022. Differentiable Spatial Regression: A Novel Method for 3D Hand Pose Estimation. IEEE Transactions on Multimedia 24 (2022), 166–176.
[34]
Xingyi Zhou, Qingfu Wan, Wei Zhang, Xiangyang Xue, and Yichen Wei. 2016. Model-based deep hand pose estimation. arXiv preprint arXiv:1606.06854 (2016).

Cited By

View all
  • (2024)Decoupling Heterogeneous Features for Robust 3D Interacting Hand Poses EstimationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681068(5338-5346)Online publication date: 28-Oct-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMR '23: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval
June 2023
694 pages
ISBN:9798400701788
DOI:10.1145/3591106
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adaptive step
  2. hand pose estimation
  3. multi-task learning
  4. shared reinforcement learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Data Availability

Supplementary video of 3D hand pose estimation via the proposed ASCS-RL strategy. https://dl.acm.org/doi/10.1145/3591106.3592215#supp_video.mp4

Funding Sources

Conference

ICMR '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)47
  • Downloads (Last 6 weeks)5
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Decoupling Heterogeneous Features for Robust 3D Interacting Hand Poses EstimationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681068(5338-5346)Online publication date: 28-Oct-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media