Skip to main content

AlignSDF: Pose-Aligned Signed Distance Fields for Hand-Object Reconstruction

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13661))

Included in the following conference series:

  • 4196 Accesses

Abstract

Recent work achieved impressive progress towards joint reconstruction of hands and manipulated objects from monocular color images. Existing methods focus on two alternative representations in terms of either parametric meshes or signed distance fields (SDFs). On one side, parametric models can benefit from prior knowledge at the cost of limited shape deformations and mesh resolutions. Mesh models, hence, may fail to precisely reconstruct details such as contact surfaces of hands and objects. SDF-based methods, on the other side, can represent arbitrary details but are lacking explicit priors. In this work we aim to improve SDF models using priors provided by parametric representations. In particular, we propose a joint learning framework that disentangles the pose and the shape. We obtain hand and object poses from parametric models and use them to align SDFs in 3D space. We show that such aligned SDFs better focus on reconstructing shape details and improve reconstruction accuracy both for hands and objects. We evaluate our method and demonstrate significant improvements over the state of the art on the challenging ObMan and DexYCB benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Baek, S., Kim, K.I., Kim, T.K.: Augmented skeleton space transfer for depth-based hand pose estimation. In: CVPR (2018)

    Google Scholar 

  2. Baek, S., Kim, K.I., Kim, T.K.: Pushing the envelope for RGB-based dense 3D hand pose estimation via neural rendering. In: CVPR (2019)

    Google Scholar 

  3. Ballan, L., Taneja, A., Gall, J., Gool, L.V., Pollefeys, M.: Motion capture of hands in action using discriminative salient points. In: ECCV (2012). https://doi.org/10.1007/978-3-642-33783-3_46

  4. Boukhayma, A., Bem, R.D., Torr, P.H.: 3D hand shape and pose from images in the wild. In: CVPR (2019)

    Google Scholar 

  5. Cai, Y., Ge, L., Cai, J., Thalmann, N.M., Yuan, J.: 3D hand pose estimation using synthetic data and weakly labeled RGB images. In: TPAMI (2020)

    Google Scholar 

  6. Chao, Y.W., et al.: DexYCB: a benchmark for capturing hand grasping of objects. In: CVPR (2021)

    Google Scholar 

  7. Chen, L., Lin, S.Y., Xie, Y., Lin, Y.Y., Xie, X.: MVHM: a large-scale multi-view hand mesh benchmark for accurate 3D hand pose estimation. In: WACV (2021)

    Google Scholar 

  8. Chen, X., et al.: Camera-space hand mesh recovery via semantic aggregation and adaptive 2D–1D registration. In: CVPR (2021)

    Google Scholar 

  9. Chen, X., Wang, G., Zhang, C., Kim, T.K., Ji, X.: SHPR-Net: deep semantic hand pose regression from point clouds. IEEE Access 6, 43425–43439 (2018)

    Article  Google Scholar 

  10. Chen, X., Zheng, Y., Black, M.J., Hilliges, O., Geiger, A.: SNARF: differentiable forward skinning for animating non-rigid neural implicit shapes. In: ICCV (2021)

    Google Scholar 

  11. Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: CVPR (2019)

    Google Scholar 

  12. Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_38

    Chapter  Google Scholar 

  13. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models-their training and application. CVIU 61(1), 38–59 (1995)

    Google Scholar 

  14. Deng, B., Lewis, J.P., Jeruzalski, T., Pons-Moll, G., Hinton, G., Norouzi, M., Tagliasacchi, A.: NASA neural articulated shape approximation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 612–628. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_36

    Chapter  Google Scholar 

  15. Doosti, B., Naha, S., Mirbagheri, M., Crandall, D.: HOPE-Net: a graph-based model for hand-object pose estimation. In: CVPR (2020)

    Google Scholar 

  16. Groueix, T., Fisher, M., Kim, V.G., Russell, B.C., Aubry, M.: A papier-mâché approach to learning 3D surface generation. In: CVPR (2018)

    Google Scholar 

  17. Hamer, H., Gall, J., Weise, T., Van Gool, L.: An object-dependent hand pose prior from sparse training data. In: CVPR (2010)

    Google Scholar 

  18. Hamer, H., Schindler, K., Koller-Meier, E., Van Gool, L.: Tracking a hand manipulating an object. In: ICCV (2009)

    Google Scholar 

  19. Hampali, S., Rad, M., Oberweger, M., Lepetit, V.: HOnnotate: a method for 3D annotation of hand and object poses. In: CVPR (2020)

    Google Scholar 

  20. Hasson, Y., Tekin, B., Bogo, F., Laptev, I., Pollefeys, M., Schmid, C.: Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In: CVPR (2020)

    Google Scholar 

  21. Hasson, Y., et al.: Learning joint reconstruction of hands and manipulated objects. In: CVPR (2019)

    Google Scholar 

  22. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  23. Heap, T., Hogg, D.: Towards 3D hand tracking using a deformable model. In: FG (1996)

    Google Scholar 

  24. Iqbal, U., Molchanov, P., Gall, T.B.J., Kautz, J.: Hand pose estimation via latent 2.5D heatmap regression. In: ECCV (2018)

    Google Scholar 

  25. Karunratanakul, K., Spurr, A., Fan, Z., Hilliges, O., Tang, S.: A skeleton-driven neural occupancy representation for articulated hands. In: 3DV (2021)

    Google Scholar 

  26. Karunratanakul, K., Yang, J., Zhang, Y., Black, M.J., Muandet, K., Tang, S.: Grasping field: learning implicit representations for human grasps. In: 3DV (2020)

    Google Scholar 

  27. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  28. Kokic, M., Kragic, D., Bohg, J.: Learning to estimate pose and shape of hand-held objects from RGB images. In: IROS (2019)

    Google Scholar 

  29. Kulon, D., Güler, R.A., Kokkinos, I., Bronstein, M., Zafeiriou, S.: Weakly-supervised mesh-convolutional hand reconstruction in the wild. In: CVPR (2020)

    Google Scholar 

  30. Kulon, D., Wang, H., Güler, R.A., Bronstein, M.M., Zafeiriou, S.: Single image 3D hand reconstruction with mesh convolutions. In: BMVC (2019)

    Google Scholar 

  31. Labbé, Y., Carpentier, J., Aubry, M., Sivic, J.: CosyPose: consistent multi-view multi-object 6D pose estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 574–591. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_34

    Chapter  Google Scholar 

  32. Labbé, Y., Carpentier, J., Aubry, M., Sivic, J.: Single-view robot pose and joint angle estimation via render & compare. In: CVPR (2021)

    Google Scholar 

  33. Li, K., et al.: ArtiBoost: boosting articulated 3D hand-object pose estimation via online exploration and synthesis. In: CVPR (2022)

    Google Scholar 

  34. Li, Y., Wang, G., Ji, X., Xiang, Y., Fox, D.: DeepIM: deep iterative matching for 6D pose estimation. In: ECCV (2018)

    Google Scholar 

  35. Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction algorithm. In: TOG (1987)

    Google Scholar 

  36. Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: CVPR (2019)

    Google Scholar 

  37. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)

    Google Scholar 

  38. Moon, G., Chang, J.Y., Lee, K.M.: V2V-PoseNet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In: CVPR (2018)

    Google Scholar 

  39. Moon, G., Shiratori, T., Lee, K.M.: DeepHandMesh: a weakly-supervised deep encoder-decoder framework for high-fidelity hand mesh modeling. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 440–455. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_26

    Chapter  Google Scholar 

  40. Mueller, F., et al.: GANerated hands for real-time 3D hand tracking from monocular RGB. In: CVPR (2018)

    Google Scholar 

  41. Mueller, F., et al.: Real-time pose and shape reconstruction of two interacting hands with a single depth camera. In: TOG (2019)

    Google Scholar 

  42. Mundy, J.L.: Object recognition in the geometric era: a retrospective. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds.) Toward Category-Level Object Recognition. LNCS, vol. 4170, pp. 3–28. Springer, Heidelberg (2006). https://doi.org/10.1007/11957959_1

    Chapter  Google Scholar 

  43. Oikonomidis, I., Kyriazis, N., Argyros, A.A.: Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints. In: ICCV (2011)

    Google Scholar 

  44. Panteleris, P., Oikonomidis, I., Argyros, A.: Using a single RGB frame for real time 3D hand pose estimation in the wild. In: WACV (2018)

    Google Scholar 

  45. Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: CVPR (2019)

    Google Scholar 

  46. Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: CVPR (2017)

    Google Scholar 

  47. Peng, S., Jiang, C., Liao, Y., Niemeyer, M., Pollefeys, M., Geiger, A.: Shape as points: a differentiable poisson solver. In: NeurIPS (2021)

    Google Scholar 

  48. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)

    Google Scholar 

  49. Qian, N., Wang, J., Mueller, F., Bernard, F., Golyanik, V., Theobalt, C.: HTML: a parametric hand texture model for 3D hand reconstruction and personalization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 54–71. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_4

    Chapter  Google Scholar 

  50. Rehg, J.M., Kanade, T.: Visual tracking of high DOF articulated structures: an application to human hand tracking. In: Eklundh, J.-O. (ed.) ECCV 1994. LNCS, vol. 801, pp. 35–46. Springer, Heidelberg (1994). https://doi.org/10.1007/BFb0028333

    Chapter  Google Scholar 

  51. Riegler, G., Ulusoy, A.O., Geiger, A.: OctNet: learning deep 3D representations at high resolutions. In: CVPR (2017)

    Google Scholar 

  52. Roberts, L.G.: Machine perception of three-dimensional solids. Ph.D. thesis, Massachusetts Institute of Technology (1963)

    Google Scholar 

  53. Romero, J., Kjellström, H., Kragic, D.: Hands in action: real-time 3D reconstruction of hands in interaction with objects. In: ICRA (2010)

    Google Scholar 

  54. Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. In: TOG (2017)

    Google Scholar 

  55. Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: ICCV (2019)

    Google Scholar 

  56. Saito, S., Yang, J., Ma, Q., Black, M.J.: SCANimate: weakly supervised learning of skinned clothed avatar networks. In: CVPR (2021)

    Google Scholar 

  57. Spurr, A., Dahiya, A., Wang, X., Zhang, X., Hilliges, O.: Self-supervised 3D hand pose estimation from monocular RGB via contrastive learning. In: CVPR (2021)

    Google Scholar 

  58. Sridhar, S., Mueller, F., Zollhöfer, M., Casas, D., Oulasvirta, A., Theobalt, C.: Real-time joint tracking of a hand manipulating an object from RGB-D input. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 294–310. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_19

    Chapter  Google Scholar 

  59. Su, S.Y., Yu, F., Zollhöfer, M., Rhodin, H.: A-NeRF: articulated neural radiance fields for learning human shape, appearance, and pose. In: NeurIPS (2021)

    Google Scholar 

  60. Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y.: Integral human pose regression. In: ECCV (2018)

    Google Scholar 

  61. Supančič, J.S., Rogez, G., Yang, Y., Shotton, J., Ramanan, D.: Depth-based hand pose estimation: methods, data, and challenges. Int. J. Comput. Vis. 126(11), 1180–1198 (2018). https://doi.org/10.1007/s11263-018-1081-7

    Article  Google Scholar 

  62. Tekin, B., Bogo, F., Pollefeys, M.: H+O: unified egocentric recognition of 3D hand-object poses and interactions. In: CVPR (2019)

    Google Scholar 

  63. Tsoli, A., Argyros, A.A.: Joint 3D tracking of a deformable object in interaction with a hand. In: ECCV (2018)

    Google Scholar 

  64. Tzionas, D., Gall, J.: 3D object reconstruction from hand-object interactions. In: ICCV (2015)

    Google Scholar 

  65. Wang, J., et al.: RGB2Hands: real-time tracking of 3D hand interactions from monocular RGB video. TOG 39(6), 1–16 (2020)

    Google Scholar 

  66. Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.G.: Pixel2Mesh: generating 3D mesh models from single RGB images. In: ECCV (2018)

    Google Scholar 

  67. Wang, Y., et al.: Video-based hand manipulation capture through composite motion control. TOG 34(4), 1–14 (2013)

    MATH  Google Scholar 

  68. Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. In: RSS (2018)

    Google Scholar 

  69. Xiong, F., et al.: A2J: anchor-to-joint regression network for 3D articulated pose estimation from a single depth image. In: ICCV (2019)

    Google Scholar 

  70. Xu, Q., Wang, W., Ceylan, D., Mech, R., Neumann, U.: DISN: deep implicit surface network for high-quality single-view 3D reconstruction. In: NeurIPS (2019)

    Google Scholar 

  71. Yang, L., Zhan, X., Li, K., Xu, W., Li, J., Lu, C.: CPF: learning a contact potential field to model the hand-object interaction. In: ICCV (2021)

    Google Scholar 

  72. Ye, Y., Gupta, A., Tulsiani, S.: What’s in your hands? 3D reconstruction of generic objects in hands. In: CVPR (2022)

    Google Scholar 

  73. Zhang, H., Zhou, Y., Tian, Y., Yong, J.H., Xu, F.: Single depth view based real-time reconstruction of hand-object interactions. TOG 40(3), 1–12 (2021)

    Google Scholar 

  74. Zhou, Y., Habermann, M., Xu, W., Habibie, I., Theobalt, C., Xu, F.: Monocular real-time hand shape and motion capture using multi-modal data. In: CVPR (2020)

    Google Scholar 

  75. Zimmermann, C., Brox, T.: Learning to estimate 3D hand pose from single RGB images. In: ICCV (2017)

    Google Scholar 

  76. Zimmermann, C., Ceylan, D., Yang, J., Russell, B., Argus, M., Brox, T.: FreiHAND: a dataset for markerless capture of hand pose and shape from single RGB images. In: ICCV (2019)

    Google Scholar 

Download references

Acknowledgements

This work was granted access to the HPC resources of IDRIS under the allocation AD011013147 made by GENCI. This work was funded in part by the French government under management of Agence Nationale de la Recherche as part of the “Investissements d’avenir” program, reference ANR19-P3IA-0001 (PRAIRIE 3IA Institute) and by Louis Vuitton ENS Chair on Artificial Intelligence.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zerui Chen .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 12732 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, Z., Hasson, Y., Schmid, C., Laptev, I. (2022). AlignSDF: Pose-Aligned Signed Distance Fields for Hand-Object Reconstruction. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13661. Springer, Cham. https://doi.org/10.1007/978-3-031-19769-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19769-7_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19768-0

  • Online ISBN: 978-3-031-19769-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics