research-article

ASCS-Reinforcement Learning: A Cascaded Framework for Accurate 3D Hand Pose Estimation

Authors:

Xi LiuAuthors Info & Claims

ICMR '23: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval

Pages 335 - 342

https://doi.org/10.1145/3591106.3592215

Published: 12 June 2023 Publication History

Abstract

3D hand pose estimation can be achieved by cascading a feature extraction module and a feature exploitation module, where reinforcement learning (RL) is proved to be an effective way to perform feature exploitation. This paper points out the prospects of improving accuracy using better exploitation strategy, and proposes an Adaptive Step-Critic Shared RL (ASCS-RL) strategy for accurate feature exploitation in 3D hand pose estimation. Hand joint features are exploited in a multi-task manner, and divided into two groups according to the distributions of estimation error. An RL-based adaptive-step (AS-RL) strategy is then used to obtain the optimal step size for better exploitation. The exploitation process are finally performed using a critic-shared RL (CS-RL) strategy, where both groups share a universal critic mechanism. Ablation studies and extensive experiments are carried out to evaluate the performance of ASCS-RL on ICVL and NYU datasets. The results show the strategy achieves the state-of-the-art accuracy in monocular depth-based 3D hand pose estimation, especially the best on ICVL. Experiments also validates that ASCS-RL realizes better tradeoff between accuracy and running rapidity.

Supplemental Material

MP4 File

Supplementary video of 3D hand pose estimation via the proposed ASCS-RL strategy.

Download
66.67 MB

References

[1]

Mingqi Chen, Shaodong Li, Feng Shuang, and Kai Luo. 2023. Cascading CNNs with S-DQN: A Parameter-Parsimonious Strategy for 3D Hand Pose Estimation. In MultiMedia Modeling: 29th International Conference, MMM 2023, Bergen, Norway, January 9–12, 2023, Proceedings, Part I. 358–369.

Digital Library

[2]

Xinghao Chen, Guijin Wang, Hengkai Guo, and Cairong Zhang. 2020. Pose guided structured region ensemble network for cascaded hand pose estimation. Neurocomputing 395 (2020), 138–149.

[3]

Xinghao Chen, Guijin Wang, Cairong Zhang, Tae-Kyun Kim, and Xiangyang Ji. 2018. SHPR-Net: Deep semantic hand pose regression from point clouds. IEEE Access 6 (2018), 43425–43439.

[4]

Jian Cheng, Yanguang Wan, Dexin Zuo, Cuixia Ma, Jian Gu, Ping Tan, Hongan Wang, Xiaoming Deng, and Yinda Zhang. 2022. Efficient Virtual View Selection for 3D Hand Pose Estimation. arXiv preprint arXiv:2203.15458 (2022).

[5]

Wencan Cheng, Jae Hyun Park, and Jong Hwan Ko. 2021. HandFoldingNet: A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11260–11269.

[6]

Kuo Du, Xiangbo Lin, Yi Sun, and Xiaohong Ma. 2019. Crossinfonet: Multi-task information sharing based hand pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9896–9905.

[7]

Linpu Fang, Xingyan Liu, Li Liu, Hang Xu, and Wenxiong Kang. 2020. JGR-P2O: Joint graph reasoning based pixel-to-offset prediction network for 3D hand pose estimation from a single depth image. In Proceedings of the European Conference on Computer Vision (ECCV). 120–137.

Digital Library

[8]

Shachar Fleishman, Mark Kliger, Alon Lerner, and Gershom Kutliroff. 2015. ICPIK: Inverse kinematics based articulated-ICP. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 28–35.

[9]

Erik Gärtner, Aleksis Pirinen, and Cristian Sminchisescu. 2020. Deep reinforcement learning for active human pose estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 10835–10844.

[10]

Liuhao Ge, Yujun Cai, Junwu Weng, and Junsong Yuan. 2018. Hand PointNet: 3D hand pose estimation using point sets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8417–8426.

[11]

Liuhao Ge, Hui Liang, Junsong Yuan, and Daniel Thalmann. 2017. 3D convolutional neural networks for efficient and robust hand pose estimation from single depth images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1991–2000.

[12]

Weiting Huang, Pengfei Ren, Jingyu Wang, Qi Qi, and Haifeng Sun. 2020. AWR: Adaptive weighting regression for 3D hand pose estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11061–11068.

[13]

Alexander Krull, Eric Brachmann, Sebastian Nowozin, Frank Michel, Jamie Shotton, and Carsten Rother. 2017. PoseAgent: Budget-constrained 6D object pose estimation via reinforcement learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6702–6710.

[14]

Debang Li, Huikai Wu, Junge Zhang, and Kaiqi Huang. 2018. A2-RL: Aesthetics aware reinforcement learning for image cropping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8193–8201.

[15]

Gyeongsik Moon, Ju Yong Chang, and Kyoung Mu Lee. 2018. V2V-PoseNet: Voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5079–5088.

[16]

Wei-Zhi Nie, Wen-Wu Jia, Wen-Hui Li, An-An Liu, and Si-Cheng Zhao. 2020. 3D pose estimation based on reinforce learning for 2D image-based 3Dmodel retrieval. IEEE Transactions on Multimedia 23 (2020), 1021–1034.

[17]

Markus Oberweger and Vincent Lepetit. 2017. Deepprior++: Improving fast and accurate 3d hand pose estimation. In Proceedings of the IEEE international conference on computer vision Workshops. 585–594.

[18]

Markus Oberweger, Paul Wohlhart, and Vincent Lepetit. 2019. Generalized feedback loop for joint hand-object pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 8 (2019), 1898–1912.

Digital Library

[19]

Pengfei Ren, Haifeng Sun, Jiachang Hao, Qi Qi, Jingyu Wang, and Jianxin Liao. 2021. Pose-Guided Hierarchical Graph Reasoning for 3-D Hand Pose Estimation From a Single Depth Image. IEEE Transactions on Cybernetics (2021).

[20]

Jianzhun Shao, Yuhang Jiang, Gu Wang, Zhigang Li, and Xiangyang Ji. 2020. PFRL: Pose-free reinforcement learning for 6D pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11454–11463.

[21]

Juil Sock, Guillermo Garcia-Hernando, and Tae-Kyun Kim. 2020. Active 6D multi-object pose estimation in cluttered scenarios with deep reinforcement learning. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 10564–10571.

Digital Library

[22]

Danhang Tang, Hyung Jin Chang, Alykhan Tejani, and Tae-Kyun Kim. 2014. Latent regression forest: Structured estimation of 3D articulated hand posture. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3786–3793.

Digital Library

[23]

Jonathan Tompson, Murphy Stein, Yann Lecun, and Ken Perlin. 2014. Real-time continuous pose recovery of human hands using convolutional networks. ACM Transactions on Graphics (ToG) 33, 5 (2014), 1–10.

Digital Library

[24]

Chengde Wan, Thomas Probst, Luc Van Gool, and Angela Yao. 2018. Dense 3D regression for hand pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5147–5156.

[25]

Guijin Wang, Xinghao Chen, Hengkai Guo, and Cairong Zhang. 2018. Region ensemble network: Towards good practices for deep 3D hand pose estimation. Journal of Visual Communication and Image Representation 55 (2018), 404–414.

Digital Library

[26]

Yan Wang, Lei Zhang, Lituan Wang, and Zizhou Wang. 2018. Multitask learning for object localization with deep reinforcement learning. IEEE Transactions on Cognitive and Developmental Systems 11, 4 (2018), 573–580.

[27]

Fu Xiong, Boshen Zhang, Yang Xiao, Zhiguo Cao, Taidong Yu, Joey Tianyi Zhou, and Junsong Yuan. 2019. A2J: Anchor-to-joint regression network for 3D articulated pose estimation from a single depth image. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 793–802.

[28]

Qi Ye, Shanxin Yuan, and Tae-Kyun Kim. 2016. Spatial attention deep net with partial PSO for hierarchical hybrid hand pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV). 346–361.

[29]

Shanxin Yuan, Guillermo Garcia-Hernando, Björn Stenger, Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee, Pavlo Molchanov, Jan Kautz, Sina Honari, Liuhao Ge, 2018. Depth-based 3d hand pose estimation: From current achievements to future goals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2636–2645.

[30]

Chao Zeng 2021. Learning compliant grasping and manipulation by teleoperation with adaptive force control. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 717–724.

Digital Library

[31]

Xiaoyan Zhang, Zhuopeng Li, and Jianmin Jiang. 2020. Emotion attention-aware collaborative deep reinforcement learning for image cropping. IEEE Transactions on Multimedia 23 (2020), 2545–2560.

Digital Library

[32]

Xiang Zhang, Liting Sun, Zhian Kuang, and Masayoshi Tomizuka. 2021. Learning variable impedance control via inverse reinforcement learning for force-related tasks. IEEE Robotics and Automation Letters 6, 2 (2021), 2225–2232.

[33]

Xingyuan Zhang and Fuhai Zhang. 2022. Differentiable Spatial Regression: A Novel Method for 3D Hand Pose Estimation. IEEE Transactions on Multimedia 24 (2022), 166–176.

Digital Library

[34]

Xingyi Zhou, Qingfu Wan, Wei Zhang, Xiangyang Xue, and Yichen Wei. 2016. Model-based deep hand pose estimation. arXiv preprint arXiv:1606.06854 (2016).

Cited By

Yao HDing CXu XLin ZCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Decoupling Heterogeneous Features for Robust 3D Interacting Hand Poses EstimationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681068(5338-5346)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681068

Index Terms

ASCS-Reinforcement Learning: A Cascaded Framework for Accurate 3D Hand Pose Estimation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction techniques

Recommendations

Lightweight 3D hand pose estimation by cascading CNNs with reinforcement learning
Abstract
This paper proposes a novel strategy for lightweight 3D hand pose estimation. The strategy decomposes the estimation process into feature extraction and feature exploitation, where feature extraction performs dimension reduction on the original ...
Highlights
- Proposing a lightweight strategy for rapid 3D hand pose estimation.
- Cascading extraction via CNNs, considering exploitation as path optimization via RL.
- Extending RL into continuous space for better exploitation.
- Leading real-...
Cascaded hierarchical CNN for 2D hand pose estimation from a single color image
Abstract
Due to severe articulation, self-occlusion, various scales, and high dexterity of the hand, hand pose estimation is more challenging than body pose estimation. Recently-developed body pose estimation algorithms are not suitable for addressing the ...
Hierarchical topology based hand pose estimation from a single depth image

Hand pose estimation benefits large human computer interaction applications. The hand pose has high dimensions of freedom (dof) for joints, and various hand poses are flexible. Hand pose estimation is still a challenge problem. Since hand joints on the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '23: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval

June 2023

694 pages

ISBN:9798400701788

DOI:10.1145/3591106

Editors:
Ioannis (Yiannis) Kompatsiaris
Centre for Research and Technology Hellas, Greece
,
Jiebo Luo
University of Rochester,USA
,
Nicu Sebe
University of Trento, Italy
,
Angela Yao
National University of Singapore, Singapore
,
Vasileios Mezaris
Centre for Research and Technology Hellas, Greece
,
Symeon Papadopoulos
Centre for Research and Technology Hellas, Greece
,
Adrian Popescu
CEA LIST, France
,
Zi (Helen) Huang
University of Queensland, Australia

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Data Availability

Supplementary video of 3D hand pose estimation via the proposed ASCS-RL strategy. https://dl.acm.org/doi/10.1145/3591106.3592215#supp_video.mp4

Funding Sources

Natural Science Foundation of Guangxi Province
Bagui Scholars Program of Guangxi Zhuang Autonomous Region
Middle-aged and Young Teachers' Basic Ability Promotion Project of Guangxi

Conference

ICMR '23

Sponsor:

SIGMM

ICMR '23: International Conference on Multimedia Retrieval

June 12 - 15, 2023

Thessaloniki, Greece

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
121
Total Downloads

Downloads (Last 12 months)47
Downloads (Last 6 weeks)5

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yao HDing CXu XLin ZCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Decoupling Heterogeneous Features for Robust 3D Interacting Hand Poses EstimationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681068(5338-5346)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681068

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten