Abstract
This paper investigates the estimate of motion parameters from 3D hand joint positions. We formulate the issue as an inverse kinematics problem with biomechanical constraints and propose a fast and robust iterative approach to address the constrained optimization. It elaborately designs a coordinate descent algorithm to decompose the problem into a sequence of decisions on the transformation around each kinematic node (i.e., joint), while the decision for each node is equivalent to a point matching problem. Addressing the whole optimization then amounts to considering all nodes of the kinematic tree from its root to leaves one by one. This not only accelerates the process but also improves the accuracy of the solution of the inverse kinematic optimization. Experiments show that our approach is able to yield results comparable to and even better than those by the state-of-the-art methods.
Similar content being viewed by others
References
Ahmad, A., Migniot, C., Dipanda, A.: Hand pose estimation and tracking in real and virtual interaction: A review. Image Vis. Comput. 89, 35–49 (2019)
Aristidou, A.: Hand tracking with physiological constraints. Vis. Comput. 34(2), 213–228 (2018)
Athitsos, V., Sclaroff, S.: Estimating 3d hand pose from a cluttered image. In: 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003), 16-22 June 2003, Madison, WI, USA, pp. 432–442 (2003)
Boukhayma, A., de Bem, R., Torr, P.H.S.: 3d hand shape and pose from images in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 10,843–10,852 (2019)
Bray, M., Koller-meier, E., Müller, P., Gool, L.V., Schraudolph, N.N.: 3d hand tracking by rapid stochastic gradient descent using a skinning model. In: 1st European Conference on Visual Media Production (CVMP, pp. 59–68 (2004)
Bronstein, M.M., Bruna, J., LeCun, Y., Szlam, A., Vandergheynst, P.: Geometric deep learning: Going beyond euclidean data. IEEE Signal Process. Mag. 34(4), 18–42 (2017)
Choi, H., Moon, G., Lee, K.M.: Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose. In: A. Vedaldi, H. Bischof, T. Brox, J. Frahm (eds.) Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part VII, Lecture Notes in Computer Science, vol. 12352, pp. 769–787 (2020)
Diebel, J.: Representing attitude: Euler angles, unit quaternions, and rotation vectors. (2006). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.110.5134
ElKoura, G., Singh, K.: Handrix: animating the human hand. In: R. Parent, K. Singh, D.E. Breen, M.C. Lin (eds.) Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, San Diego, CA, USA, July 26-27, 2003, pp. 110–119 (2003)
Feng, Z., Zhang, M., Pan, Z., Yang, B., Xu, T., Tang, H., Li, Y.: 3d-freehand-pose initialization based on operator’s cognitive behavioral models. Vis. Comput. 26(6–8), 607–617 (2010)
Ge, L., Ren, Z., Li, Y., Xue, Z., Wang, Y., Cai, J., Yuan, J.: 3d hand shape and pose estimation from a single RGB image. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 10,833–10,842 (2019)
Glauser, O., Wu, S., Panozzo, D., Hilliges, O., Sorkine-Hornung, O.: Interactive hand pose estimation using a stretch-sensing soft glove. ACM Trans. Graph. 38(4), 41:1–41:15 (2019)
Gower, J.C., Dijksterhuis, G.B.: Procrustes Problems. Oxford University Press, Oxford (2004)
Hasson, Y., Varol, G., Tzionas, D., Kalevatykh, I., Black, M.J., Laptev, I., Schmid, C.: Learning joint reconstruction of hands and manipulated objects. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 11,807–11,816. Computer Vision Foundation / IEEE (2019)
Heap, T., Hogg, D.C.: Towards 3d hand tracking using a deformable model. In: 2nd International Conference on Automatic Face and Gesture Recognition (FG ’96), October 14-16, 1996, Killington, Vermont, USA, pp. 140–145 (1996)
Imai, A., Shimada, N., Shirai, Y.: 3-d hand posture recognition by training contour variation. In: Sixth IEEE International Conference on Automatic Face and Gesture Recognition (FGR 2004), May 17-19, 2004, Seoul, Korea, pp. 895–900 (2004)
Imai, A., Shimada, N., Shirai, Y.: Hand posture estimation in complex backgrounds by considering mis-match of model. In: Y. Yagi, S.B. Kang, I. Kweon, H. Zha (eds.) Computer Vision - ACCV 2007, 8th Asian Conference on Computer Vision, Tokyo, Japan, November 18-22, 2007, Proceedings, Part I, Lecture Notes in Computer Science, vol. 4843, pp. 596–607 (2007)
Joo, H., Simon, T., Li, X., Liu, H., Tan, L., Gui, L., Banerjee, S., Godisart, T.S., Nabbe, B., Matthews, I., Kanade, T., Nobuhara, S., Sheikh, Y.: Panoptic studio: A massively multiview system for social interaction capture. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2017)
Kabsch, W.: A solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A 32, 922–922 (1976)
Kitagawa, M., Windsor, B.: MoCap for Artists: Workflow and Techniques for Motion Capture (2012). https://doi.org/10.4324/9780080877945
Kolluri, R.K., Shewchuk, J.R., O’Brien, J.F.: Spectral surface reconstruction from noisy point clouds. In: J. Boissonnat, P. Alliez (eds.) Second Eurographics Symposium on Geometry Processing, Nice, France, July 8-10, 2004, ACM International Conference Proceeding Series, vol. 71, pp. 11–21 (2004)
Kulon, D., Güler, R.A., Kokkinos, I., Bronstein, M.M., Zafeiriou, S.: Weakly-supervised mesh-convolutional hand reconstruction in the wild. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 4989–4999 (2020)
de La Gorce, M., Fleet, D.J., Paragios, N.: Model-based 3d hand pose estimation from monocular video. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1793–1805 (2011)
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 248:1–248:16 (2015)
Magnenat-thalmann, N., Laperrire, R., Thalmann, D., Montréal, U.D.: Joint-dependent local deformations for hand animation and object grasping. In: In Proceedings on Graphics interface ’88, pp. 26–33 (1988)
Marquardt, A., Maiero, J., Kruijff, E., Trepkowski, C., Schwandt, A., Hinkenjann, A., Schöning, J., Stuerzlinger, W.: Tactile hand motion and pose guidance for 3d interaction. In: S.N. Spencer, S. Morishima, Y. Itoh, T. Shiratori, Y. Yue, R. Lindeman (eds.) Proceedings of the 24th ACM Symposium on Virtual Reality Software and Technology, VRST 2018, Tokyo, Japan, November 28 - December 01, 2018, pp. 3:1–3:10. ACM (2018)
Melax, S., Keselman, L., Orsten, S.: Dynamics based 3d skeletal hand tracking. In: F.F. Samavati, K. Hawkey (eds.) Graphics Interface 2013, GI ’13, Regina, SK, Canada, May 29-31, 2013, Proceedings, pp. 63–70 (2013)
Miyamoto, S., Matsuo, T., Shimada, N., Shirai, Y.: Real-time and precise 3-d hand posture estimation based on classification tree trained with variations of appearances. In: Proceedings of the 21st International Conference on Pattern Recognition, ICPR 2012, Tsukuba, Japan, November 11-15, 2012, pp. 453–456 (2012)
Mueller, F., Davis, M., Bernard, F., Sotnychenko, O., Verschoor, M., Otaduy, M.A., Casas, D., Theobalt, C.: Real-time pose and shape reconstruction of two interacting hands with a single depth camera. ACM Trans. Graph. 38(4), 82:1–82:12 (2019)
Oikonomidis, I., Kyriazis, N., Argyros, A.A.: Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints. In: D.N. Metaxas, L. Quan, A. Sanfeliu, L.V. Gool (eds.) IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, November 6-13, 2011, pp. 2088–2095 (2011)
Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A.A.A., Tzionas, D., Black, M.J.: Expressive body capture: 3d hands, face, and body from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 10,975–10,985 (2019)
Pavllo, D., Porssut, T., Herbelin, B., Boulic, R.: Real-time marker-based finger tracking with neural networks. In: K. Kiyokawa, F. Steinicke, B.H. Thomas, G. Welch (eds.) 2018 IEEE Conference on Virtual Reality and 3D User Interfaces, VR 2018, Tuebingen/Reutlingen, Germany, 18-22 March 2018, pp. 651–652 (2018)
Peng, H., Xian, C., Zhang, Y.: 3d hand mesh reconstruction from a monocular RGB image. Vis. Comput. 36(10), 2227–2239 (2020)
Polygerinos, P., Galloway, K.C., Savage, E., Herman, M., O’Donnell, K., Walsh, C.J.: Soft robotic glove for hand rehabilitation and task specific training. In: IEEE International Conference on Robotics and Automation, ICRA 2015, Seattle, WA, USA, 26-30 May, 2015, pp. 2913–2919 (2015)
Qian, C., Sun, X., Wei, Y., Tang, X., Sun, J.: Realtime and robust hand tracking from depth. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014, pp. 1106–1113 (2014)
Qian, N., Wang, J., Mueller, F., Bernard, F., Golyanik, V., Theobalt, C.: HTML: A parametric hand texture model for 3d hand reconstruction and personalization. In: A. Vedaldi, H. Bischof, T. Brox, J. Frahm (eds.) Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XI, Lecture Notes in Computer Science, vol. 12356, pp. 54–71. Springer (2020)
Romero, J., Kjellström, H., Kragic, D.: Monocular real-time 3d articulated hand pose estimation. In: 9th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2009, Paris, France, December 7-10, 2009, pp. 87–92. IEEE (2009)
Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. 36(6), 245:1–245:17 (2017)
Rong, Y., Shiratori, T., Joo, H.: Frankmocap: Fast monocular 3d hand and body motion capture by regression and integration. arXiv preprint arXiv:2008.08324 (2020)
Simon, T., Joo, H., Matthews, I.A., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 4645–4653 (2017)
Spurr, A., Iqbal, U., Molchanov, P., Hilliges, O., Kautz, J.: Weakly supervised 3d hand pose estimation via biomechanical constraints. In: A. Vedaldi, H. Bischof, T. Brox, J. Frahm (eds.) Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XVII, Lecture Notes in Computer Science, vol. 12362, pp. 211–228 (2020)
Spurr, A., Song, J., Park, S., Hilliges, O.: Cross-modal deep variational hand pose estimation. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 89–98 (2018)
Stenger, B., Mendonça, P.R.S., Cipolla, R.: Model-based 3d tracking of an articulated hand. In: 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), with CD-ROM, 8-14 December 2001, Kauai, HI, USA, pp. 310–315 (2001)
Tang, D., Chang, H.J., Tejani, A., Kim, T.: Latent regression forest: Structured estimation of 3d articulated hand posture. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014, pp. 3786–3793 (2014)
Tompson, J., Stein, M., LeCun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. 33(5), 169:1–169:10 (2014)
Tuffield, P., Elias, H.: The shadow robot mimics human actions. Ind. Robot 30(1), 56–60 (2003)
Wang, R.Y., Popovic, J.: Real-time hand-tracking with a color glove. ACM Trans. Graph. 28(3), 63 (2009)
Wheatland, N., Jörg, S., Zordan, V.B.: Automatic hand-over animation using principle component analysis. In: R. McDonnell, N.R. Sturtevant, V.B. Zordan (eds.) Motion in Games, MIG ’13, Dublin, Ireland, November 6-8, 2013, pp. 197–202 (2013)
Wheatland, N., Wang, Y., Song, H., Neff, M., Zordan, V.B., Jörg, S.: State of the art in hand and finger modeling and animation. Comput. Graph. Forum 34(2), 735–760 (2015)
Wilding, J., Corcos, D.M.: Basic biomechanics of the musculoskeletal system, ed 3. (reviews). (book review). Physical Therapy (December) (2001)
Xiang, D., Joo, H., Sheikh, Y.: Monocular total capture: Posing face, body, and hands in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 10,965–10,974 (2019)
Zhang, J., Jiao, J., Chen, M., Qu, L., Xu, X., Yang, Q.: A hand pose tracking benchmark from stereo matching. In: 2017 IEEE International Conference on Image Processing, ICIP 2017, Beijing, China, September 17-20, 2017, pp. 982–986 (2017)
Zhang, X., Li, Q., Mo, H., Zhang, W., Zheng, W.: End-to-end hand mesh recovery from a monocular RGB image. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp. 2354–2364 (2019)
Zhao, W., Chai, J., Xu, Y.: Combining marker-based mocap and RGB-D camera for acquiring high-fidelity hand motion data. In: J. Lee, P.G. Kry (eds.) Proceedings of the 2012 Eurographics/ACM SIGGRAPH Symposium on Computer Animation, SCA 2012, Lausanne, Switzerland, 2012, pp. 33–42 (2012)
Zhou, Y., Habermann, M., Xu, W., Habibie, I., Theobalt, C., Xu, F.: Monocular real-time hand shape and motion capture using multi-modal data. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 5345–5354 (2020)
Zimmermann, C., Brox, T.: Learning to estimate 3d hand pose from single RGB images. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 4913–4921 (2017)
Zimmermann, C., Ceylan, D., Yang, J., Russell, B.C., Argus, M.J., Brox, T.: Freihand: A dataset for markerless capture of hand pose and shape from single RGB images. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp. 813–822 (2019)
Funding
This work was partially supported by NSFC (61972160, 62072191), Guangdong Basic and Applied Basic Research Foundation (2021A1515012301, 2019A1515010833, 2019A1515010860), and the Fundamental Research Funds for the Central Universities (2020ZYGXZR089, D2190670).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Supplementary material 1 (mp4 82012 KB)
Rights and permissions
About this article
Cite this article
Li, G., Wu, Z., Liu, Y. et al. 3D hand reconstruction from a single image based on biomechanical constraints. Vis Comput 37, 2699–2711 (2021). https://doi.org/10.1007/s00371-021-02250-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-021-02250-y