Improving Zero-Shot Template-Based 6D Pose Estimation with Geometric Features

Pöllabauer, Thomas; Weyel, Johannes; Knauthe, Volker; Berkei, Sarah; Kuijper, Arjan

doi:10.1007/978-3-031-77392-1_4

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15046))

Included in the following conference series:

International Symposium on Visual Computing

213 Accesses

Abstract

6D Object Pose Estimation is a fundamental problem in robotics and augmented reality. Most of today’s state-of-the-art approaches rely on deep learning and require large sets of training images depicting the target objects. A growing number of algorithms try to generalize from a set of known objects, available for training, to unseen objects at test time. Among those, GigaPose is a template-based approach, that renders the target object in an onboarding phase shortly before inference and uses learned latent codes of these renderings and observed objects for feature matching. While learned representation prove powerful in a wide range of tasks, we propose the integration of additional purely geometric features, which can be extracted basically for free from the available 3D meshes during the onboarding phase. This representation is then used as an additional input for template- and 2D-2D correspondence matching in our approach. We consider multiple relevant features and, implementing one of them, demonstrate improved performance on the core datasets of the relevant BOP Challenge. Our results suggest that, indeed, utilizing additional geometric features can improve the relevant metrics without much additional cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ausserlechner, P., Haberger, D., Thalhammer, S., Weibel, J.B., Vincze, M.: Zs6d: Zero-shot 6d object pose estimation using vision transformers. ArXiv abs/2309.11986 (2023). https://api.semanticscholar.org/CorpusID:262084099
Caron, M., Touvron, H., Misra, I., J’egou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9630–9640 (2021). https://api.semanticscholar.org/CorpusID:233444273
Chang, A.X., et al.: Shapenet: an information-rich 3d model repository. ArXiv abs/1512.03012 (2015). https://api.semanticscholar.org/CorpusID:2554264
Chen, J., Sun, M., Bao, T., Zhao, R., Wu, L., He, Z.: Zeropose: Cad-model-based zero-shot pose estimation (2023). https://api.semanticscholar.org/CorpusID:258960481
Cohen-Steiner, D., Morvan, J.M.: Restricted delaunay triangulations and normal cycle. In: SCG ’03 (2003). https://api.semanticscholar.org/CorpusID:5777927
Deng, X., Geng, J., Bretl, T., Xiang, Y., Fox, D.: icaps: iterative category-level object pose and shape estimation. IEEE Robot. Automation Lett. PP, 1 (2021). https://api.semanticscholar.org/CorpusID:245650812
Di, Y., Manhardt, F., Wang, G., Ji, X., Navab, N., Tombari, F.: So-pose: exploiting self-occlusion for direct 6d pose estimation. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 12376–12385 (2021), https://api.semanticscholar.org/CorpusID:237213284
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. ArXiv abs/2010.11929 (2020). https://api.semanticscholar.org/CorpusID:225039882
Downs, L., et al.: Google scanned objects: a high-quality dataset of 3d scanned household items. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 2553–2560 (2022). https://api.semanticscholar.org/CorpusID:248392390
Fan, Z., et al.: Pope: 6-dof promptable pose estimation of any object, in any scene, with one reference. ArXiv abs/2305.15727 (2023). https://api.semanticscholar.org/CorpusID:258887814
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 381–395 (1981). https://api.semanticscholar.org/CorpusID:972888
Girdhar, R., El-Nouby, A., Liu, Z., Singh, M., Alwala, K.V., Joulin, A., Misra, I.: Imagebind one embedding space to bind them all. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15180–15190 (2023). https://api.semanticscholar.org/CorpusID:258564264
Harris, C., Stephens, M., et al.: A combined corner and edge detector. In: Alvey Vision Conference, vol. 15, pp. 10–5244. Citeseer (1988)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask r-cnn (2017). https://api.semanticscholar.org/CorpusID:54465873
He, X., Sun, J., Wang, Y., Huang, D., Bao, H., Zhou, X.: Onepose++: keypoint-free one-shot object pose estimation without CAD models. In: Advances in Neural Information Processing Systems (2022)
Google Scholar
He, Y., Wang, Y., Fan, H., Sun, J., Chen, Q.: Fs6d: Few-shot 6d pose estimation of novel objects. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6804–6814 (2022). https://api.semanticscholar.org/CorpusID:247763100
Hodan, T., Baráth, D., Matas, J.: Epos: Estimating 6d pose of objects with symmetries. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11700–11709 (2020). https://api.semanticscholar.org/CorpusID:214743136
Hodan, T., et al.: Bop challenge 2023 on detection, segmentation and pose estimation of seen and unseen rigid objects. arXiv preprint arXiv:2403.09799 (2024)
Kirillov, A., et al.: Segment anything. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3992–4003 (2023). https://api.semanticscholar.org/CorpusID:257952310
Labb’e, Y., et al.: Megapose: 6d pose estimation of novel objects via render & compare. In: Conference on Robot Learning (2022). https://api.semanticscholar.org/CorpusID:254636085
Lin, J., Wei, Z., Li, Z., Xu, S., Jia, K., Li, Y.: Dualposenet: category-level 6d object pose and size estimation using dual pose network with refined learning of pose consistency. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3540–3549 (2021). https://api.semanticscholar.org/CorpusID:232185618
Lin, Y.C., Florence, P.R., Barron, J.T., Rodriguez, A., Isola, P., Lin, T.Y.: inerf: Inverting neural radiance fields for pose estimation. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1323–1330 (2020). https://api.semanticscholar.org/CorpusID:228083990
Liu, X., et al.: Gdrnpp (2022). https://github.com/shanice-l/gdrnpp_bop2022
Liu, Y., et al.: Gen6d: generalizable model-free 6-dof object pose estimation from rgb images. In: European Conference on Computer Vision (2022). https://api.semanticscholar.org/CorpusID:248366253
Luo, G., Dunlap, L., Park, D.H., Holynski, A., Darrell, T.: Diffusion hyperfeatures: searching through time and space for semantic correspondence. In: Advances in Neural Information Processing Systems (2023)
Google Scholar
Moreno-Noguer, F., Lepetit, V., Fua, P.V.: Accurate non-iterative o(n) solution to the pnp problem. 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8 (2007). https://api.semanticscholar.org/CorpusID:3107373
Nguyen, V.N., Groueix, T., Salzmann, M., Lepetit, V.: Gigapose: Fast and robust novel object pose estimation via one correspondence. ArXiv abs/2311.14155 (2023). https://api.semanticscholar.org/CorpusID:265445006
Nguyen, V.N., Hodan, T., Ponimatkin, G., Groueix, T., Lepetit, V.: Cnos: A strong baseline for cad-based novel object segmentation. In: 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp. 2126–2132 (2023). https://api.semanticscholar.org/CorpusID:259991698
Oquab, M., et al.: Dinov2: learning robust visual features without supervision. ArXiv abs/2304.07193 (2023). https://api.semanticscholar.org/CorpusID:258170077
Örnek, E.P., et al.: Foundpose: unseen object pose estimation with foundation features. ArXiv abs/2311.18809 (2023). https://api.semanticscholar.org/CorpusID:265506592
Pan, P., Fan, Z., Feng, B.Y., Wang, P., Li, C., Wang, Z.: Learning to estimate 6dof pose from limited data: A few-shot, generalizable approach using rgb images. ArXiv abs/2306.07598 (2023). https://api.semanticscholar.org/CorpusID:259144908
Peng, S., Liu, Y., Huang, Q.X., Bao, H., Zhou, X.: Pvnet: pixel-wise voting network for 6dof pose estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4556–4565 (2018). https://api.semanticscholar.org/CorpusID:57189382
Pitteri, G., Ilic, S., Lepetit, V.: Cornet: Generic 3d corners for 6d pose estimation of new objects without retraining. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). pp. 2807–2815 (2019). https://api.semanticscholar.org/CorpusID:201698155
Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) pp. 12159–12168 (2021), https://api.semanticscholar.org/CorpusID:232352612
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. 44(3) (2022)
Google Scholar
Reis, D., Kupec, J., Hong, J., Daoudi, A.: Real-time flying object detection with yolov8. ArXiv abs/2305.09972 (2023). https://api.semanticscholar.org/CorpusID:258741093
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10674–10685 (2022). https://doi.org/10.1109/CVPR52688.2022.01042
Shugurov, I.S., Li, F., Busam, B., Ilic, S.: Osop: a multi-stage one shot object pose estimation framework. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6825–6834 (2022). https://api.semanticscholar.org/CorpusID:247779245
Sipiran, I., Bustos, B.: Harris 3d: a robust extension of the harris operator for interest point detection on 3d meshes. Visual Comput. 27, 963–976 (2011). https://api.semanticscholar.org/CorpusID:15897631
Song, C., Song, J., Huang, Q.X.: Hybridpose: 6d object pose estimation under hybrid representations. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 428–437 (2020). https://api.semanticscholar.org/CorpusID:210023370
Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: Loftr: detector-free local feature matching with transformers. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8918–8927 (2021). https://api.semanticscholar.org/CorpusID:232478646
Sundermeyer, M., et al.: Bop challenge 2022 on detection, segmentation and pose estimation of specific rigid objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2784–2793 (2023)
Google Scholar
Tang, L., Jia, M., Wang, Q., Phoo, C.P., Hariharan, B.: Emergent correspondence from image diffusion. Adv. Neural. Inf. Process. Syst. 36, 1363–1389 (2023)
Google Scholar
Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 13, 376–380 (1991). https://api.semanticscholar.org/CorpusID:206421766
Wang, G., Manhardt, F., Tombari, F., Ji, X.: Gdr-net: geometry-guided direct regression network for monocular 6d object pose estimation. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16606–16616 (2021). https://api.semanticscholar.org/CorpusID:232035418
Wang, H., Sridhar, S., Huang, J., Valentin, J.P.C., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6d object pose and size estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2637–2646 (2019). https://api.semanticscholar.org/CorpusID:57761160
Wen, B., Yang, W., Kautz, J., Birchfield, S.T.: Foundationpose: Unified 6d pose estimation and tracking of novel objects. ArXiv abs/2312.08344 (2023). https://api.semanticscholar.org/CorpusID:266191252
Wen, Y., Li, X., Pan, H., Yang, L., Wang, Z., Komura, T., Wang, W.: Disp6d: disentangled implicit shape and pose learning for scalable 6d pose estimation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022, pp. 404–421. Springer, Cham (2022)
Chapter Google Scholar
Yang, L., Kang, B., Huang, Z., Xu, X., Feng, J., Zhao, H.: Depth anything: Unleashing the power of large-scale unlabeled data. ArXiv abs/2401.10891 (2024). https://api.semanticscholar.org/CorpusID:267061016
Yang, L., Kang, B., Huang, Z., Xu, X., Feng, J., Zhao, H.: Depth anything: unleashing the power of large-scale unlabeled data. arXiv preprint arXiv:2401.10891 (2024)
Zhan, G., Zheng, C., Xie, W., Zisserman, A.: What does stable diffusion know about the 3d scene? ArXiv abs/2310.06836 (2023). https://api.semanticscholar.org/CorpusID:263829471
Zhang, J., et al.: A tale of two features: stable diffusion complements dino for zero-shot semantic correspondence (2023)
Google Scholar
Zhang, J., Herrmann, C., Hur, J., Chen, E., Jampani, V., Sun, D., Yang, M.H.: Telling left from right: Identifying geometry-aware semantic correspondence (2023)
Google Scholar
Zhao, X., et al.: Fast segment anything. ArXiv abs/2306.12156 (2023). https://api.semanticscholar.org/CorpusID:259212104

Download references

Acknowledgement

This research has received funding from the European Union’s Horizon Europe programme in the course of the ZDZW project under grant agreement No 101057404. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them.

Author information

Authors and Affiliations

Fraunhofer Institute for Computer Graphics Research IGD, Darmstadt, Germany
Thomas Pöllabauer, Johannes Weyel, Sarah Berkei & Arjan Kuijper
Technical University of Darmstadt GRIS, Darmstadt, Germany
Thomas Pöllabauer, Johannes Weyel, Volker Knauthe & Arjan Kuijper

Authors

Thomas Pöllabauer
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Weyel
View author publications
You can also search for this author in PubMed Google Scholar
Volker Knauthe
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Berkei
View author publications
You can also search for this author in PubMed Google Scholar
Arjan Kuijper
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Pöllabauer .

Editor information

Editors and Affiliations

University of Nevada Reno, Reno, NV, USA
George Bebis
Johns Hopkins University, Baltimore, MD, USA
Vishal Patel
Chinese University of Hong Kong, Shatin, Hong Kong
Jinwei Gu
University of California, Davis, CA, USA
Julian Panetta
George Mason University, Fairfax, VA, USA
Yotam Gingold
University of Georgia, Athens, GA, USA
Kyle Johnsen
Colorado State University, Fort Collins, CO, USA
Mohammed Safayet Arefin
Indian Institute of Technology, Kanpur, Uttar Pradesh, India
Soumya Dutta
Los Alamos National Lab., Los Alamos, NM, USA
Ayan Biswas

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pöllabauer, T., Weyel, J., Knauthe, V., Berkei, S., Kuijper, A. (2025). Improving Zero-Shot Template-Based 6D Pose Estimation with Geometric Features. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2024. Lecture Notes in Computer Science, vol 15046. Springer, Cham. https://doi.org/10.1007/978-3-031-77392-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-77392-1_4
Published: 22 January 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-77391-4
Online ISBN: 978-3-031-77392-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Zero-Shot Template-Based 6D Pose Estimation with Geometric Features