Skip to main content
Log in

Video driven adaptive grasp planning of virtual hand using deep reinforcement learning

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Data-driven grasp planning can generate anthropopathic grasps, providing controllers with robust and natural responses to environmental changes or morphological discrepancies. Mocap data, which is the widely used source of motion data, can provide high-fidelity dynamic motions. However, it is challenging for non-professionals to quickly get start and collect sufficient mocap data for grasp training. Furthermore, current grasp planning approaches suffer from limited adaptive abilities, and thus cannot be applied to objects of different shapes and sizes directly. In this paper, we propose the first framework, to the best of our knowledge, for fast and easy design of grasping controller with kinematic algorithms based on monocular 3D hand pose estimation and deep reinforcement learning, leveraging abundant and flexible videos of desired grasps. Specially, we first get original grasping sequences through 3D hand pose estimation from given monocular video fragments. Then, we reconstruct the motion sequences using data smoothing based on the peek clipping filter, and further optimize them using the CMA-ES (Covariance Matrix Adaptation Evolution Strategy). Finally, we integrate the reference motion with the adaptive grasping controller through deep reinforcement learning. Quantitative and qualitative results demonstrate that our framework is able to generate natural and stable grasps easily from monocular video demonstrations, added the adaptive ability to primitive objects of different shapes and sizes in the target object library.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data Availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

Code Availability

The code is available at https://zhiyongsu.github.io.

References

  1. Antotsiou D, Garcia C, Tae K (2018) Task-oriented hand motion retargeting for dexterous manipulation imitation. In: Computer vision - ECCV 2018 workshops, pp 287–301

  2. Aravind R, Vikash K, Abhishek G (2018) Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. Science and Systems. Pittsburgh, pp 1–9

  3. Brahmbhatt S, Handa A, Hays J (2019) Contactgrasp: functional multi-finger grasp synthesis from contact. In: IEEE International conference on intelligent robots and systems, pp 2386–2393

  4. Buckingham G (2021) Hand tracking for immersive virtual reality: opportunities and challenges. arXvi:https://arxiv.org/abs/1809.02627

  5. Cao Z, Sheikh Y, Simon T (2019) Pose: realtime multi-person 2d pose estimation using part affinity field. IEEE Trans Pattern Anal Mach Intell 35:1–14

    Google Scholar 

  6. Chen Y, Tu Z, Ge L (2019) So-handnet: self-organizing network for 3d hand pose estimation with semi-supervised learning. In: The IEEE/CVF International Conference On Computer Vision, pp 6960–6969

  7. Ciocarlie M, Goldfeder C, Allen K (2007) Dimensionality reduction for hand-independent dexterous robotic grasping. In: The Proceedings of IEEE/RSJ international conference on intelligent robots and systems, pp 3270–3275

  8. Documentation U (2021) Rigidbody. https://docs.unity3d.com/Manual/class-Rigidbody.html

  9. Ferrari C, Canny J (1992) Planning optimal grasps. In: The IEEE International conference on robotics and automation. Los Alamitos, pp 2290–2295

  10. Franziska M, Florian B, Oleksandr S (2018) Generated hands for real-time 3d hand tracking from monocular rgb. In: IEEE Conference on computer vision and pattern recognition, pp 49–59

  11. Hao T, Changbo W, Manocha D (2019) Realtime hand-object interaction using learned grasp space for virtual environments. IEEE Trans Vis Comput Graph 25:2623–2635

    Article  Google Scholar 

  12. He W, Pirk S, Yumer E (2019) Learning a generative model for multi-step human–object interactions from videos. Comput Graph Forum, 367–378

  13. Ji K, Nguyen T, Tae K (2009) 3-d hand motion tracking and gesture recognition using a data glove. In: 2009 IEEE International symposium on industrial electronics, pp 1013–1018

  14. Joao P, Thiago M, Thiago L (2020) Learning to dance: a graph convolutional adversarial network to generate realistic dance motions from audio. Comput Graph 94:11–21

    Google Scholar 

  15. Julia S, Christian E, Simon O (2018) Synergy-based, data-driven generation of object-specific grasps for anthropomorphic hands. In: IEEE-RAS 18th International conference on humanoid robots, pp 327–333

  16. Juliani A, Berges V, Vckay E (2019) Unity: a general platform for intelligent agents. arXiv:https://arxiv.org/abs/1809.02627

  17. Kanazawa A, Black M, Jacobs D (2018) End-to-end recovery of human shape and pose. In: CVF conference on computer vision and pattern recognition, pp 7122–7131

  18. Kopicki M, Adjigble M, Stolkin R (2016) One shot learning and generation of dexterous grasps for novel objects. Int J Robot Res 35:959–976

    Article  Google Scholar 

  19. Libin L, Jessica H (2017) Learning to schedule control fragments for physics-based characters using deep q-learning. ACM Trans Graph 37:1–14

    Google Scholar 

  20. Libin L, Jessica H (2018) Learning basketball dribbling skills using trajectory optimization and deep reinforcement learning. ACM Trans Graph 37:15–21

    Google Scholar 

  21. Miller A, Allen P (2004) Graspit! A versatile simulator for robotic grasping. IEEE Robot Autom Mag 11:110–122

    Article  Google Scholar 

  22. Min L, Zherong P, Kai X (2019) Generating grasp poses for a high-dof gripper using neural networks. In: 2019 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 1518–1525

  23. Naijun L, Tao L, Yinghao C (2019) A review of robot manipula- tion skills learning methods. Acta Automatica Sinica 45:458–470

    MATH  Google Scholar 

  24. Peng s Zhongqi F, Ligang L (2018) Grasp planning via hand-object geometric fitting. Vis Comput 34:257–270

    Article  Google Scholar 

  25. Roccetti M, Marfia G, Zanichelli M (2010) The art and craft of making the tortellino: playing with a digital gesture recognizer for preparing pasta culinary recipes. Comput Entertain 8:1–20

    Article  Google Scholar 

  26. Soshi S, Vladislav G, Weipeng X, Christian T (2020) Physcap: physically plausible monocular 3d motion capture in real time. ACM Trans Graph 39:1–16

    Google Scholar 

  27. Starke J, Eichmann C, Ottenhaus S (2018) Synergy-based, data-driven generation of object- specific grasps for anthropomorphic hands. In: IEEE-RAS International conference on humanoid robots, pp 327–333

  28. Weichang C (2014) Comparative analysis and biomechanical analysis of human motion based on kinect. Master’s thesis, Tianjin University

    Google Scholar 

  29. Xiaoyuan W, Hao T, Changbo W (2020) Research on natural grasp generation of the virtual hand. Journal of Computer-Aided Design & Computer Graphics, 32 1–9

  30. Xiong Z, Qiang L, Hong M (2019) End-to-end hand mesh recovery from a monocular rgb image. In: IEEE International conference on computer vision, pp 2354–2364

  31. Xuebin P, Kanazawa A, Malik J (2018) Sfv:reinforcement learning of physical skills from videos. ACM Trans Graph 37:178–192

    Google Scholar 

  32. Xuebin P, Sergey L, De V (2018) Deepmimic: example-guided deep reinforcement learning of physics-based character skills. ACM Trans Graph 37:1–14

    Google Scholar 

  33. Yili F, Cheng L (2009) Hand modeling and motion controlling based on lay figure in virtual assembly. Comput Integr Manuf Syst 15:681–684

    Google Scholar 

  34. Yu R, Park H, lee J (2019) Figure skating simulation from video[j]. computer graphics forum. ACM Trans Graph 38:225–234

    Google Scholar 

  35. Yuxiao Z, Marc H, Weipeng X (2020) Monocular real-time hand shape and motion capture using multi-modal data. Computer Vision and Pattern Recognition, 5346–5355

  36. Zicong L, Fanzhong Z, Zihui W (2020) Training a virtual tabletennis player based on reinforcement learning. J Comput-Aided Des Comput Graph 32:997–1008

    Google Scholar 

Download references

Acknowledgments

This work was supported by the National Key R&D Program of China (grant number: 2018YFB1004904), the Fundamental Research Funds for the Central Universities under Grant 30918012203.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiyong Su.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Y., Zhang, Z., Qiu, D. et al. Video driven adaptive grasp planning of virtual hand using deep reinforcement learning. Multimed Tools Appl 82, 16301–16322 (2023). https://doi.org/10.1007/s11042-022-14190-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-14190-3

Keywords

Navigation