RBP-Pose: Residual Bounding Box Projection for Category-Level Pose Estimation

Zhang, Ruida; Di, Yan; Lou, Zhiqiang; Manhardt, Fabian; Tombari, Federico; Ji, Xiangyang

doi:10.1007/978-3-031-19769-7_38

Ruida Zhang¹²,
Yan Di¹³,
Zhiqiang Lou¹²,
Fabian Manhardt¹⁴,
Federico Tombari^13,14 &
…
Xiangyang Ji¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13661))

Included in the following conference series:

European Conference on Computer Vision

4142 Accesses
35 Citations

Abstract

Category-level object pose estimation aims to predict the 6D pose as well as the 3D metric size of arbitrary objects from a known set of categories. Recent methods harness shape prior adaptation to map the observed point cloud into the canonical space and apply Umeyama algorithm to recover the pose and size. However, their shape prior integration strategy boosts pose estimation indirectly, which leads to insufficient pose-sensitive feature extraction and slow inference speed. To tackle this problem, in this paper, we propose a novel geometry-guided Residual Object Bounding Box Projection network RBP-Pose that jointly predicts object pose and residual vectors describing the displacements from the shape-prior-indicated object surface projections on the bounding box towards the real surface projections. Such definition of residual vectors is inherently zero-mean and relatively small, and explicitly encapsulates spatial cues of the 3D object for robust and accurate pose regression. We enforce geometry-aware consistency terms to align the predicted pose and residual vectors to further boost performance. Finally, to avoid overfitting and enhance the generalization ability of RBP-Pose, we propose an online non-linear shape augmentation scheme to promote shape diversity during training. Extensive experiments on NOCS datasets demonstrate that RBP-Pose surpasses all existing methods by a large margin, whilst achieving a real-time inference speed.

R. Zhang and Y. Di—Equal contributions.

Codes are released at https://github.com/lolrudy/RBP_Pose.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Sca-pose: category-level 6D pose estimation with adaptive shape prior based on CNN and graph convolution

Article 12 February 2025

LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation

Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation from Monocular RGB Image

References

Chen, D., Li, J., Wang, Z., Xu, K.: Learning canonical shape space for category-level 6d object pose and size estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11973–11982 (2020)
Google Scholar
Chen, K., Dou, Q.: SGPA: structure-guided prior adaptation for category-level 6D object pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2773–2782 (2021)
Google Scholar
Chen, W., Jia, X., Chang, H.J., Duan, J., Linlin, S., Leonardis, A.: FS-net: fast shape-based network for category-level 6d object pose estimation with decoupled rotation mechanism. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1581–1590 (2021)
Google Scholar
Chen, Y., Tai, L., Sun, K., Li, M.: Monopair: Monocular 3D object detection using pairwise spatial relationships. In: CVPR, pp. 12093–12102 (2020)
Google Scholar
Deng, X., Xiang, Y., Mousavian, A., Eppner, C., Bretl, T., Fox, D.: Self-supervised 6D object pose estimation for robot manipulation. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 3665–3671. IEEE (2020)
Google Scholar
Di, Y., Manhardt, F., Wang, G., Ji, X., Navab, N., Tombari, F.: So-pose: exploiting self-occlusion for direct 6D pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12396–12405 (2021)
Google Scholar
Di, Y., et al.: GPV-pose: category-level object pose estimation via geometry-guided point-wise voting. arXiv preprint (2022)
Google Scholar
Fan, Z., et al.: ACR-pose: adversarial canonical representation reconstruction network for category level 6D object pose estimation. arXiv preprint arXiv:2111.10524 (2021)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
He, Y., Huang, H., Fan, H., Chen, Q., Sun, J.: FFB6D: a full flow bidirectional fusion network for 6d pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3003–3013, June 2021
Google Scholar
He, Y., Sun, W., Huang, H., Liu, J., Fan, H., Sun, J.: PVN3D: a deep point-wise 3D keypoints voting network for 6dof pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11632–11641 (2020)
Google Scholar
Hodan, T., Barath, D., Matas, J.: Epos: estimating 6D pose of objects with symmetries. In: CVPR, pp. 11703–11712 (2020)
Google Scholar
Hodaň, T., et al.: BOP: benchmark for 6D object pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 19–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_2
Chapter Google Scholar
Hodaň, T., et al.: BOP challenge 2020 on 6D object localization. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12536, pp. 577–594. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66096-3_39
Chapter Google Scholar
Hu, Y., Fua, P., Wang, W., Salzmann, M.: Single-stage 6D object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2930–2939 (2020)
Google Scholar
Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: The IEEE International Conference on Computer Vision (ICCV), October 2017
Google Scholar
Kehl, W., Milletari, F., Tombari, F., Ilic, S., Navab, N.: Deep learning of local RGB-D patches for 3D object detection and 6D pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 205–220. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_13
Chapter Google Scholar
Labbé, Y., Carpentier, J., Aubry, M., Sivic, J.: CosyPose: consistent multi-view multi-object 6D pose estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 574–591. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_34
Chapter Google Scholar
Li, C., Bai, J., Hager, G.D.: A unified framework for multi-view multi-class object pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 263–281. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_16
Chapter Google Scholar
Li, Y., Wang, G., Ji, X., Xiang, Y., Fox, D.: DeepIM: deep iterative matching for 6d pose estimation. IJCV, 1–22 (2019)
Google Scholar
Li, Z., Wang, G., Ji, X.: CDPN: coordinates-based disentangled pose network for real-time RGB-based 6-DoF object pose estimation. In: ICCV, pp. 7678–7687 (2019)
Google Scholar
Lin, H., Liu, Z., Cheang, C., Zhang, L., Fu, Y., Xue, X.: Donet: learning category-level 6d object pose and size estimation from depth observation. arXiv preprint arXiv:2106.14193 (2021)
Lin, J., Wei, Z., Li, Z., Xu, S., Jia, K., Li, Y.: Dualposenet: category-level 6d object pose and size estimation using dual pose network with refined learning of pose consistency. arXiv preprint arXiv:2103.06526 (2021)
Lin, Z.H., Huang, S.Y., Wang, Y.C.F.: Convolution in the cloud: learning deformable kernels in 3d graph convolution networks for point cloud analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1800–1809 (2020)
Google Scholar
Liu, L., et al.: On the variance of the adaptive learning rate and beyond. In: International Conference on Learning Representations (2019)
Google Scholar
Manhardt, F., et al.: Explaining the ambiguity of object detection and 6D pose from visual data. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6841–6850 (2019)
Google Scholar
Manhardt, F., Kehl, W., Navab, N., Tombari, F.: Deep model-based 6D pose refinement in RGB. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 833–849. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_49
Chapter Google Scholar
Manhardt, F., et al.: Cps++: improving class-level 6D pose and shape estimation from monocular images with self-supervised learning. arXiv preprint arXiv:2003.05848v3 (2020)
Nie, Y., Han, X., Guo, S., Zheng, Y., Chang, J., Zhang, J.J.: Total3dunderstanding: joint layout, object pose and mesh reconstruction for indoor scenes from a single image. In: CVPR, pp. 55–64 (2020)
Google Scholar
Park, K., Patten, T., Vincze, M.: Pix2pose: pixel-wise coordinate regression of objects for 6D pose estimation. In: ICCV (2019)
Google Scholar
Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: PvNet: pixel-wise voting network for 6dof pose estimation. In: CVPR (2019)
Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Google Scholar
Song, C., Song, J., Huang, Q.: Hybridpose: 6D object pose estimation under hybrid representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 431–440 (2020)
Google Scholar
Su, Y., Rambach, J., Minaskan, N., Lesur, P., Pagani, A., Stricker, D.: Deep multi-state object pose estimation for augmented reality assembly. In: 2019 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pp. 222–227. IEEE (2019)
Google Scholar
Sundermeyer, M., et al.: Multi-path learning for object pose estimation across domains. In: CVPR, pp. 13916–13925 (2020)
Google Scholar
Sundermeyer, M., Marton, Z.-C., Durner, M., Brucker, M., Triebel, R.: Implicit 3D orientation learning for 6D object detection from RGB images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 712–729. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_43
Chapter Google Scholar
Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6D object pose prediction. In: CVPR, pp. 292–301 (2018)
Google Scholar
Tian, M., Ang, M.H., Lee, G.H.: Shape prior deformation for categorical 6D object pose and size estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 530–546. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_32
Chapter Google Scholar
Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 13(04), 376–380 (1991). https://doi.org/10.1109/34.88573
Article Google Scholar
Wang, C., et al.: 6-pack: Category-level 6D pose tracker with anchor-based keypoints. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 10059–10066. IEEE (2020)
Google Scholar
Wang, C., et al.: DenseFusion: 6D object pose estimation by iterative dense fusion. In: CVPR, pp. 3343–3352 (2019)
Google Scholar
Wang, G., Manhardt, F., Tombari, F., Ji, X.: GDR-net: geometry-guided direct regression network for monocular 6D object pose estimation. In: CVPR, June 2021
Google Scholar
Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6d object pose and size estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2642–2651 (2019)
Google Scholar
Wang, J., Chen, K., Dou, Q.: Category-level 6D object pose estimation via cascaded relation and recurrent reconstruction networks. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2021)
Google Scholar
Weng, Y., et al.: Captra: category-level pose tracking for rigid and articulated objects from point clouds. arXiv preprint arXiv:2104.03437 (2021)
Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3D pose estimation. In: CVPR (2015)
Google Scholar
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. In: RSS (2018)
Google Scholar
Yong, H., Huang, J., Hua, X., Zhang, L.: Gradient centralization: a new optimization technique for deep neural networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 635–652. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_37
Chapter Google Scholar
Zakharov, S., Shugurov, I., Ilic, S.: Dpod: dense 6d pose object detector in RGB images. In: ICCV (2019)
Google Scholar
Zhang, C., Cui, Z., Zhang, Y., Zeng, B., Pollefeys, M., Liu, S.: Holistic 3D scene understanding from a single image with implicit representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8833–8842 (2021)
Google Scholar
Zhang, M., Fei, S.X., Liu, J., Xu, S., Piao, Y., Lu, H.: Asymmetric two-stream architecture for accurate RGB-D saliency detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 374–390. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_23
Chapter Google Scholar
Zhang, M., Lucas, J., Ba, J., Hinton, G.E.: Lookahead optimizer: k steps forward, 1 step back. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Tsinghua University, Beijing, China
Ruida Zhang, Zhiqiang Lou & Xiangyang Ji
Technical University of Munich, Munich, Germany
Yan Di & Federico Tombari
Google, Zurich, Switzerland
Fabian Manhardt & Federico Tombari

Authors

Ruida Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Di
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Lou
View author publications
You can also search for this author in PubMed Google Scholar
Fabian Manhardt
View author publications
You can also search for this author in PubMed Google Scholar
Federico Tombari
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyang Ji
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruida Zhang .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1001 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, R., Di, Y., Lou, Z., Manhardt, F., Tombari, F., Ji, X. (2022). RBP-Pose: Residual Bounding Box Projection for Category-Level Pose Estimation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13661. Springer, Cham. https://doi.org/10.1007/978-3-031-19769-7_38

Download citation

DOI: https://doi.org/10.1007/978-3-031-19769-7_38
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19768-0
Online ISBN: 978-3-031-19769-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

RBP-Pose: Residual Bounding Box Projection for Category-Level Pose Estimation