S $$^2$$ Contact: Graph-Based Network for 3D Hand-Object Contact Estimation with Semi-supervised Learning

Tse, Tze Ho Elden; Zhang, Zhongqun; Kim, Kwang In; Leonardis, Ales̆; Zheng, Feng; Chang, Hyung Jin

doi:10.1007/978-3-031-19769-7_33

Tze Ho Elden Tse¹²,
Zhongqun Zhang¹²,
Kwang In Kim¹³,
Ales̆ Leonardis¹²,
Feng Zheng¹⁴ &
…
Hyung Jin Chang¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13661))

Included in the following conference series:

European Conference on Computer Vision

4079 Accesses
9 Citations

Abstract

Despite the recent efforts in accurate 3D annotations in hand and object datasets, there still exist gaps in 3D hand and object reconstructions. Existing works leverage contact maps to refine inaccurate hand-object pose estimations and generate grasps given object models. However, they require explicit 3D supervision which is seldom available and therefore, are limited to constrained settings, e.g., where thermal cameras observe residual heat left on manipulated objects. In this paper, we propose a novel semi-supervised framework that allows us to learn contact from monocular images. Specifically, we leverage visual and geometric consistency constraints in large-scale datasets for generating pseudo-labels in semi-supervised learning and propose an efficient graph-based network to infer contact. Our semi-supervised learning framework achieves a favourable improvement over the existing supervised learning methods trained on data with ‘limited’ annotations. Notably, our proposed model is able to achieve superior results with less than half the network parameters and memory access cost when compared with the commonly-used PointNet-based approach. We show benefits from using a contact map that rules hand-object interactions to produce more accurate reconstructions. We further demonstrate that training with pseudo-labels can extend contact map estimations to out-of-domain objects and generalise better across multiple datasets. Project page is available (https://eldentse.github.io/s2contact/).

T. H. E. Tse and Z. Zhang—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Weakly-Supervised 3D Hand Reconstruction with Knowledge Prior and Uncertainty Guidance

ContactPose: A Dataset of Grasps with Object Contact and Hand Pose

Dual Grid Net: Hand Mesh Vertex Regression from Single Depth Maps

References

Brahmbhatt, S., Ham, C., Kemp, C.C., Hays, J.: ContactDB: analyzing and predicting grasp contact via thermal imaging. In: CVPR (2019)
Google Scholar
Brahmbhatt, S., Tang, C., Twigg, C.D., Kemp, C.C., Hays, J.: ContactPose: a dataset of grasps with object contact and hand pose. In: ECCV (2020)
Google Scholar
Cao, Z., Radosavovic, I., Kanazawa, A., Malik, J.: Reconstructing hand-object interactions in the wild. In: ICCV (2021)
Google Scholar
Chao, Y.W., et al.: DexYCB: a benchmark for capturing hand grasping of objects. In: CVPR (2021)
Google Scholar
Chen, W., Jia, X., Chang, H.J., Duan, J., Leonardis, A.: G2L-Net: global to local network for real-time 6D pose estimation with embedding vector features. In: CVPR (2020)
Google Scholar
Chen, W., Jia, X., Chang, H.J., Duan, J., Shen, L., Leonardis, A.: FS-Net: fast shape-based network for category-level 6D object pose estimation with decoupled rotation mechanism. In: CVPR (2021)
Google Scholar
Chen, Y., Tu, Z., Ge, L., Zhang, D., Chen, R., Yuan, J.: SO-HandNet: self-organizing network for 3D hand pose estimation with semi-supervised learning. In: CVPR (2019)
Google Scholar
Corona, E., Pumarola, A., Alenya, G., Moreno-Noguer, F., Rogez, G.: GanHand: predicting human grasp affordances in multi-object scenes. In: CVPR (2020)
Google Scholar
Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: NeurIPS (2016)
Google Scholar
Doosti, B., Naha, S., Mirbagheri, M., Crandall, D.J.: HOPE-Net: a graph-based model for hand-object pose estimation. In: CVPR (2020)
Google Scholar
Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.K.: First-person hand action benchmark with RGB-D videos and 3D hand pose annotations. In: CVPR (2018)
Google Scholar
Grady, P., Tang, C., Twigg, C.D., Vo, M., Brahmbhatt, S., Kemp, C.C.: ContactOpt: optimizing contact to improve grasps. In: CVPR (2021)
Google Scholar
Guo, M.-H., Cai, J.-X., Liu, Z.-N., Mu, T.-J., Martin, R.R., Hu, S.-M.: PCT: point cloud transformer. Computational Visual Media 7(2), 187–199 (2021). https://doi.org/10.1007/s41095-021-0229-5
Article Google Scholar
Hampali, S., Rad, M., Oberweger, M., Lepetit, V.: Honnotate: A method for 3D annotation of hand and object poses. In: CVPR (2020)
Google Scholar
Han, S., et al.: MEgATrack: monochrome egocentric articulated hand-tracking for virtual reality. In: SIGGRAPH (2020)
Google Scholar
Hasson, Y., Tekin, B., Bogo, F., Laptev, I., Pollefeys, M., Schmid, C.: Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In: CVPR (2020)
Google Scholar
Hasson, Y., Varol, G., Laptev, I., Schmid, C.: Towards unconstrained joint hand-object reconstruction from RGB videos. In: 3DV (2021)
Google Scholar
Hasson, Y., et al.: Learning joint reconstruction of hands and manipulated objects. In: CVPR (2019)
Google Scholar
Huang, L., Tan, J., Meng, J., Liu, J., Yuan, J.: HOT-Net: non-autoregressive transformer for 3D hand-object pose estimation. In: ACM MM (2020)
Google Scholar
Jiang, H., Liu, S., Wang, J., Wang, X.: Hand-object contact consistency reasoning for human grasps generation. In: ICCV (2021)
Google Scholar
Karunratanakul, K., Yang, J., Zhang, Y., Black, M.J., Muandet, K., Tang, S.: Grasping field: learning implicit representations for human grasps. In: 3DV (2020)
Google Scholar
Kato, H., Ushiku, Y., Harada, T.: Neural 3D mesh renderer. In: CVPR (2018)
Google Scholar
Kaviani, S., Rahimi, A., Hartley, R.: Semi-Supervised 3D hand shape and pose estimation with label propagation. arXiv preprint arXiv:2111.15199 (2021)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)
Google Scholar
Kwon, T., Tekin, B., Stühmer, J., Bogo, F., Pollefeys, M.: H2O: two hands manipulating objects for first person interaction recognition. In: ICCV (2021)
Google Scholar
Labbé, Y., Carpentier, J., Aubry, M., Sivic, J.: CosyPose: consistent multi-view multi-object 6D pose estimation. In: ECCV (2020)
Google Scholar
Li, G., Muller, M., Thabet, A., Ghanem, B.: DeepGNSs: can GCNs go as deep as CNNs? In: ICCV (2019)
Google Scholar
Li, Y., Wang, G., Ji, X., Xiang, Y., Fox, D.: DeepIM: deep iterative matching for 6D pose estimation. In: ECCV (2018)
Google Scholar
Lin, Z.H., Huang, S.Y., Wang, Y.C.F.: Convolution in the cloud: learning deformable kernels in 3D graph convolution networks for point cloud analysis. In: CVPR (2020)
Google Scholar
Liu, S., Jiang, H., Xu, J., Liu, S., Wang, X.: Semi-supervised 3D hand-object poses estimation with interactions in time. In: CVPR (2021)
Google Scholar
Liu, Z., Hu, H., Cao, Y., Zhang, Z., Tong, X.: A closer look at local aggregation operators in point cloud analysis. In: ECCV (2020)
Google Scholar
Maturana, D., Scherer, S.: VoxNet: a 3D convolutional neural network for real-time object recognition. In: IROS (2015)
Google Scholar
Monti, F., Boscaini, D., Masci, J., Rodola, E., Svoboda, J., Bronstein, M.M.: Geometric deep learning on graphs and manifolds using mixture model CNNs. In: CVPR (2017)
Google Scholar
Mueller, F., et al.: GANerated hands for real-time 3D hand tracking from monocular RGB. In: CVPR (2018)
Google Scholar
Mueller, F., et al.: Real-time pose and shape reconstruction of two interacting hands with a single depth camera. In: SIGGRAPH (2019)
Google Scholar
Paszke, A., et al.: Automatic Differentiation in Pytorch. In: NeurIPS (2017)
Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: NeurIPS (2017)
Google Scholar
Qian, G., Hammoud, H., Li, G., Thabet, A., Ghanem, B.: ASSANet: an anisotropic separable set abstraction for efficient point cloud representation learning. NeurIPS (2021)
Google Scholar
Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. (ToG) 36(6), 1–17 (2017)
Google Scholar
Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: CVPR (2017)
Google Scholar
Spurr, A., Molchanov, P., Iqbal, U., Kautz, J., Hilliges, O.: Adversarial motion modelling helps semi-supervised hand pose estimation. arXiv preprint arXiv:2106.05954 (2021)
Spurr, A., Song, J., Park, S., Hilliges, O.: Cross-modal deep variational hand pose estimation. In: CVPR (2018)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)
Google Scholar
Taheri, O., Ghorbani, N., Black, M.J., Tzionas, D.: GRAB: a dataset of whole-body human grasping of objects. In: ECCV (2020)
Google Scholar
Tang, D., Chang, H.J., Tejani, A., Kim, T.K.: Latent regression forest: structured estimation of 3D articulated hand posture. In: CVPR (2014)
Google Scholar
Tang, D., Yu, T.H., Kim, T.K.: Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In: ICCV (2013)
Google Scholar
Tekin, B., Bogo, F., Pollefeys, M.: H+O: unified egocentric recognition of 3D hand-object poses and interactions. In: CVPR (2019)
Google Scholar
Ueda, E., Matsumoto, Y., Imai, M., Ogasawara, T.: A hand-pose estimation for vision-based human interfaces. IEEE Trans. Ind. Electron. 50(4), 676–684 (2003)
Google Scholar
Wang, H., Cong, Y., Litany, O., Gao, Y., Guibas, L.J.: 3DIoUMatch: leveraging IoU prediction for semi-supervised 3D object detection. In: CVPR (2021)
Google Scholar
Wang, J., et al.: RGB2Hands: real-time tracking of 3D hand interactions from monocular RGB video. In: SIGGRAPH (2020)
Google Scholar
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. In: SIGGRAPH (2019)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Wu, W., Qi, Z., Fuxin, L.: PointConv: deep convolutional networks on 3D point clouds. In: CVPR (2019)
Google Scholar
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. In: RSS (2018)
Google Scholar
Xu, M., Ding, R., Zhao, H., Qi, X.: PAConv: position adaptive convolution with dynamic kernel assembling on point clouds. In: CVPR (2021)
Google Scholar
Yang, J., Chang, H.J., Lee, S., Kwak, N.: SeqHAND: RGB-sequence-based 3D hand pose and shape estimation. In: ECCV (2020)
Google Scholar
Yang, L., Chen, S., Yao, A.: SemiHand: semi-supervised hand pose estimation with consistency. In: ICCV (2021)
Google Scholar
Yang, L., Zhan, X., Li, K., Xu, W., Li, J., Lu, C.: CPF: learning a contact potential field to model the hand-object interaction. In: ICCV (2021)
Google Scholar
You, H., Feng, Y., Ji, R., Gao, Y.: PVNet: a joint convolutional network of point cloud and multi-view for 3D shape recognition. In: ACM Multimedia (2018)
Google Scholar
Zhang, T., et al.: Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In: ICRA (2018)
Google Scholar
Zhao, H., Jiang, L., Jia, J., Torr, P.H., Koltun, V.: Point transformer. In: ICCV (2021)
Google Scholar
Zimmermann, C., Brox, T.: Learning to estimate 3D hand pose from single RGB images. In: ICCV (2017)
Google Scholar

Download references

Acknowledgements

This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP–2022–2020–0–01789) supervised by the IITP (Institute of Information & Communications Technology Planning & Evaluation) and the Baskerville Tier 2 HPC service (https://www.baskerville.ac.uk/) funded by the Engineering and Physical Sciences Research Council (EPSRC) and UKRI through the World Class Labs scheme (EP/T022221/1) and the Digital Research Infrastructure programme (EP/W032244/1) operated by Advanced Research Computing at the University of Birmingham. KIK was supported by the National Research Foundation of Korea (NRF) grant (No. 2021R1A2C2012195) and IITP grants (IITP–2021–0–02068 and IITP–2020–0–01336). ZQZ was supported by China Scholarship Council (CSC) Grant No. 202208060266. AL was supported in part by the EPSRC (grant number EP/S032487/1). FZ was supported by the National Natural Science Foundation of China under Grant No. 61972188 and 62122035.

Author information

Authors and Affiliations

University of Birmingham, Birmingham, UK
Tze Ho Elden Tse, Zhongqun Zhang, Ales̆ Leonardis & Hyung Jin Chang
UNIST, Ulsan, Korea
Kwang In Kim
SUSTech, Shenzhen, China
Feng Zheng

Authors

Tze Ho Elden Tse
View author publications
You can also search for this author in PubMed Google Scholar
Zhongqun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kwang In Kim
View author publications
You can also search for this author in PubMed Google Scholar
Ales̆ Leonardis
View author publications
You can also search for this author in PubMed Google Scholar
Feng Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Hyung Jin Chang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhongqun Zhang .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 906 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tse, T.H.E., Zhang, Z., Kim, K.I., Leonardis, A., Zheng, F., Chang, H.J. (2022). S$^2$Contact: Graph-Based Network for 3D Hand-Object Contact Estimation with Semi-supervised Learning. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13661. Springer, Cham. https://doi.org/10.1007/978-3-031-19769-7_33

Download citation

DOI: https://doi.org/10.1007/978-3-031-19769-7_33
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19768-0
Online ISBN: 978-3-031-19769-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

S\(^2\)Contact: Graph-Based Network for 3D Hand-Object Contact Estimation with Semi-supervised Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Weakly-Supervised 3D Hand Reconstruction with Knowledge Prior and Uncertainty Guidance

ContactPose: A Dataset of Grasps with Object Contact and Hand Pose

Dual Grid Net: Hand Mesh Vertex Regression from Single Depth Maps

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 906 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

S\(^2\)Contact: Graph-Based Network for 3D Hand-Object Contact Estimation with Semi-supervised Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Weakly-Supervised 3D Hand Reconstruction with Knowledge Prior and Uncertainty Guidance

ContactPose: A Dataset of Grasps with Object Contact and Hand Pose

Dual Grid Net: Hand Mesh Vertex Regression from Single Depth Maps

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 906 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us