Abstract
Data-driven grasp planning can generate anthropopathic grasps, providing controllers with robust and natural responses to environmental changes or morphological discrepancies. Mocap data, which is the widely used source of motion data, can provide high-fidelity dynamic motions. However, it is challenging for non-professionals to quickly get start and collect sufficient mocap data for grasp training. Furthermore, current grasp planning approaches suffer from limited adaptive abilities, and thus cannot be applied to objects of different shapes and sizes directly. In this paper, we propose the first framework, to the best of our knowledge, for fast and easy design of grasping controller with kinematic algorithms based on monocular 3D hand pose estimation and deep reinforcement learning, leveraging abundant and flexible videos of desired grasps. Specially, we first get original grasping sequences through 3D hand pose estimation from given monocular video fragments. Then, we reconstruct the motion sequences using data smoothing based on the peek clipping filter, and further optimize them using the CMA-ES (Covariance Matrix Adaptation Evolution Strategy). Finally, we integrate the reference motion with the adaptive grasping controller through deep reinforcement learning. Quantitative and qualitative results demonstrate that our framework is able to generate natural and stable grasps easily from monocular video demonstrations, added the adaptive ability to primitive objects of different shapes and sizes in the target object library.
Similar content being viewed by others
Data Availability
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
Code Availability
The code is available at https://zhiyongsu.github.io.
References
Antotsiou D, Garcia C, Tae K (2018) Task-oriented hand motion retargeting for dexterous manipulation imitation. In: Computer vision - ECCV 2018 workshops, pp 287–301
Aravind R, Vikash K, Abhishek G (2018) Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. Science and Systems. Pittsburgh, pp 1–9
Brahmbhatt S, Handa A, Hays J (2019) Contactgrasp: functional multi-finger grasp synthesis from contact. In: IEEE International conference on intelligent robots and systems, pp 2386–2393
Buckingham G (2021) Hand tracking for immersive virtual reality: opportunities and challenges. arXvi:https://arxiv.org/abs/1809.02627
Cao Z, Sheikh Y, Simon T (2019) Pose: realtime multi-person 2d pose estimation using part affinity field. IEEE Trans Pattern Anal Mach Intell 35:1–14
Chen Y, Tu Z, Ge L (2019) So-handnet: self-organizing network for 3d hand pose estimation with semi-supervised learning. In: The IEEE/CVF International Conference On Computer Vision, pp 6960–6969
Ciocarlie M, Goldfeder C, Allen K (2007) Dimensionality reduction for hand-independent dexterous robotic grasping. In: The Proceedings of IEEE/RSJ international conference on intelligent robots and systems, pp 3270–3275
Documentation U (2021) Rigidbody. https://docs.unity3d.com/Manual/class-Rigidbody.html
Ferrari C, Canny J (1992) Planning optimal grasps. In: The IEEE International conference on robotics and automation. Los Alamitos, pp 2290–2295
Franziska M, Florian B, Oleksandr S (2018) Generated hands for real-time 3d hand tracking from monocular rgb. In: IEEE Conference on computer vision and pattern recognition, pp 49–59
Hao T, Changbo W, Manocha D (2019) Realtime hand-object interaction using learned grasp space for virtual environments. IEEE Trans Vis Comput Graph 25:2623–2635
He W, Pirk S, Yumer E (2019) Learning a generative model for multi-step human–object interactions from videos. Comput Graph Forum, 367–378
Ji K, Nguyen T, Tae K (2009) 3-d hand motion tracking and gesture recognition using a data glove. In: 2009 IEEE International symposium on industrial electronics, pp 1013–1018
Joao P, Thiago M, Thiago L (2020) Learning to dance: a graph convolutional adversarial network to generate realistic dance motions from audio. Comput Graph 94:11–21
Julia S, Christian E, Simon O (2018) Synergy-based, data-driven generation of object-specific grasps for anthropomorphic hands. In: IEEE-RAS 18th International conference on humanoid robots, pp 327–333
Juliani A, Berges V, Vckay E (2019) Unity: a general platform for intelligent agents. arXiv:https://arxiv.org/abs/1809.02627
Kanazawa A, Black M, Jacobs D (2018) End-to-end recovery of human shape and pose. In: CVF conference on computer vision and pattern recognition, pp 7122–7131
Kopicki M, Adjigble M, Stolkin R (2016) One shot learning and generation of dexterous grasps for novel objects. Int J Robot Res 35:959–976
Libin L, Jessica H (2017) Learning to schedule control fragments for physics-based characters using deep q-learning. ACM Trans Graph 37:1–14
Libin L, Jessica H (2018) Learning basketball dribbling skills using trajectory optimization and deep reinforcement learning. ACM Trans Graph 37:15–21
Miller A, Allen P (2004) Graspit! A versatile simulator for robotic grasping. IEEE Robot Autom Mag 11:110–122
Min L, Zherong P, Kai X (2019) Generating grasp poses for a high-dof gripper using neural networks. In: 2019 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 1518–1525
Naijun L, Tao L, Yinghao C (2019) A review of robot manipula- tion skills learning methods. Acta Automatica Sinica 45:458–470
Peng s Zhongqi F, Ligang L (2018) Grasp planning via hand-object geometric fitting. Vis Comput 34:257–270
Roccetti M, Marfia G, Zanichelli M (2010) The art and craft of making the tortellino: playing with a digital gesture recognizer for preparing pasta culinary recipes. Comput Entertain 8:1–20
Soshi S, Vladislav G, Weipeng X, Christian T (2020) Physcap: physically plausible monocular 3d motion capture in real time. ACM Trans Graph 39:1–16
Starke J, Eichmann C, Ottenhaus S (2018) Synergy-based, data-driven generation of object- specific grasps for anthropomorphic hands. In: IEEE-RAS International conference on humanoid robots, pp 327–333
Weichang C (2014) Comparative analysis and biomechanical analysis of human motion based on kinect. Master’s thesis, Tianjin University
Xiaoyuan W, Hao T, Changbo W (2020) Research on natural grasp generation of the virtual hand. Journal of Computer-Aided Design & Computer Graphics, 32 1–9
Xiong Z, Qiang L, Hong M (2019) End-to-end hand mesh recovery from a monocular rgb image. In: IEEE International conference on computer vision, pp 2354–2364
Xuebin P, Kanazawa A, Malik J (2018) Sfv:reinforcement learning of physical skills from videos. ACM Trans Graph 37:178–192
Xuebin P, Sergey L, De V (2018) Deepmimic: example-guided deep reinforcement learning of physics-based character skills. ACM Trans Graph 37:1–14
Yili F, Cheng L (2009) Hand modeling and motion controlling based on lay figure in virtual assembly. Comput Integr Manuf Syst 15:681–684
Yu R, Park H, lee J (2019) Figure skating simulation from video[j]. computer graphics forum. ACM Trans Graph 38:225–234
Yuxiao Z, Marc H, Weipeng X (2020) Monocular real-time hand shape and motion capture using multi-modal data. Computer Vision and Pattern Recognition, 5346–5355
Zicong L, Fanzhong Z, Zihui W (2020) Training a virtual tabletennis player based on reinforcement learning. J Comput-Aided Des Comput Graph 32:997–1008
Acknowledgments
This work was supported by the National Key R&D Program of China (grant number: 2018YFB1004904), the Fundamental Research Funds for the Central Universities under Grant 30918012203.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, Y., Zhang, Z., Qiu, D. et al. Video driven adaptive grasp planning of virtual hand using deep reinforcement learning. Multimed Tools Appl 82, 16301–16322 (2023). https://doi.org/10.1007/s11042-022-14190-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-14190-3