Skip to main content

Shape Descriptor Guided Learning for Category-Level Object Pose Estimation

  • Conference paper
  • First Online:
Advances in Computer Graphics (CGI 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15340))

Included in the following conference series:

  • 133 Accesses

Abstract

Category-level object pose estimation plays a crucial role in a wide range of practical applications by accurately predicting the poses and sizes of unseen objects within a specific category. However, accurately estimating object poses remains a significant challenge due to substantial shape variations within the same category. To address this issue, this paper introduces a novel learning network for object pose estimation that is guided by a shape descriptor. By capturing the geometric information of an object’s shape, the shape descriptor provides valuable input for subsequent feature learning, effectively handling shape variations. Moreover, our framework incorporates a confidence-based pose estimator, which assigns confidence scores to each pose prediction. This integration allows for the acquisition of more accurate poses with higher confidence by penalizing poses with low confidence. Experimental results on the CAMERA25 and REAL275 datasets demonstrate the superiority of our approach over state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, D., Li, J., Wang, Z., Xu, K.: Learning canonical shape space for category-level 6D object pose and size estimation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11973–11982 (2020)

    Google Scholar 

  2. Chen, K., Dou, Q.: SGPA: structure-guided prior adaptation for category-level 6d object pose estimation. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2753–2762 (2021)

    Google Scholar 

  3. Chen, W., Jia, X., Chang, H.J., Duan, J., Shen, L., Leonardis, A.: FS-Net: fast shape-based network for category-level 6d object pose estimation with decoupled rotation mechanism. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1581–1590 (2021)

    Google Scholar 

  4. Di, Y., et al.: GPV-pose: category-level object pose estimation via geometry-guided point-wise voting, pp. 6771–6781 (2022)

    Google Scholar 

  5. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. In: Communications of the ACM, pp. 381–395 (1981)

    Google Scholar 

  6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE Computer Society, Los Alamitos, CA, USA (2016)

    Google Scholar 

  7. He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 42, 386–397 (2020)

    Article  Google Scholar 

  8. He, Y., Huang, H., Fan, H., Chen, Q., Sun, J.: FFB6D: a full flow bidirectional fusion network for 6d pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3002–3012 (2021)

    Google Scholar 

  9. He, Y., Sun, W., Huang, H., Liu, J., Fan, H., Sun, J.: PVN3D: a deep point-wise 3D keypoints voting network for 6dof pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11629–11638 (2020)

    Google Scholar 

  10. Lee, T., et al.: UDA-COPE: unsupervised domain adaptation for category-level object pose estimation. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14871–14880. IEEE Computer Society, Los Alamitos, CA, USA (2022)

    Google Scholar 

  11. Li, G., et al.: Generative category-level shape and pose estimation with semantic primitives. In: 6th Annual Conference on Robot Learning (2022)

    Google Scholar 

  12. Li, Z., Wang, G., Ji, X.: CDPN: coordinates-based disentangled pose network for real-time RGB-based 6-DoF object pose estimation. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7677–7686 (2019)

    Google Scholar 

  13. Lin, H., Liu, Z., Cheang, C., Fu, Y., Guo, G., Xue, X.: SAR-Net: shape alignment and recovery network for category-level 6D object pose and size estimation. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6697–6707 (2022)

    Google Scholar 

  14. Lin, H., Liu, Z., Cheang, C.H., Zhang, L., Fu, Y., Xue, X.: DONet: learning category-level 6D object pose and size estimation from depth observation. ArXiv abs/2106.14193 (2021)

    Google Scholar 

  15. Lin, J., Wei, Z., Li, Z., Xu, S., Jia, K., Li, Y.: DualPoseNet: category-level 6d object pose and size estimation using dual pose network with refined learning of pose consistency, pp. 3540–3549 (2021)

    Google Scholar 

  16. Lin, J., Wei, Z., Ding, C., Jia, K.: Category-level 6d object pose and size estimation using self-supervised deep prior deformation networks, pp. 19–34 (2022)

    Google Scholar 

  17. Lin, Z.H., Huang, S.Y., Wang, Y.C.F.: Convolution in the cloud: learning deformable kernels in 3d graph convolution networks for point cloud analysis. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1800–1809 (2020)

    Google Scholar 

  18. Liu, C., et al.: Adaptive smooth L1 loss: a better way to regress scene texts with extreme aspect ratios. In: 2021 IEEE Symposium on Computers and Communications (ISCC), pp. 1–7 (2021)

    Google Scholar 

  19. Liu, J., Chen, Y., Ye, X., Qi, X.: IST-Net: prior-free category-level pose estimation with implicit space transformation, pp. 13932–13942 (2023)

    Google Scholar 

  20. Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: PVNet: pixel-wise voting network for 6DoF pose estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4556–4565 (2019)

    Google Scholar 

  21. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: Deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  22. Sahin, C., Kim, T.: Category-level 6D object pose recovery in depth images. In: Computer Vision – ECCV 2018 Workshops, vol. 11129, pp. 665–681 (2018)

    Google Scholar 

  23. Wang, C., et al.: DenseFusion: 6D object pose estimation by iterative dense fusion. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3338–3347 (2019)

    Google Scholar 

  24. Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2637–2646 (Jun 2019)

    Google Scholar 

  25. Wang, J., Chen, K., Dou, Q.: Category-level 6D object pose estimation via cascaded relation and recurrent reconstruction networks. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4807–4814 (2021)

    Google Scholar 

  26. Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6d object pose estimation in cluttered scenes. In: Robotics: Science and Systems XIV (2018)

    Google Scholar 

  27. Zhang, R., Di, Y., Lou, Z., Manhardt, F., Tombari, F., Ji, X.: RBP-Pose: residual bounding box projection for category-level pose estimation, pp. 655–672 (2022)

    Google Scholar 

  28. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6230–6239 (2017)

    Google Scholar 

Download references

Acknowledgements

The work described in this paper was fully supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (No. UGC/FDS16/E14/21).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiming Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, Y. et al. (2025). Shape Descriptor Guided Learning for Category-Level Object Pose Estimation. In: Magnenat-Thalmann, N., Kim, J., Sheng, B., Deng, Z., Thalmann, D., Li, P. (eds) Advances in Computer Graphics. CGI 2024. Lecture Notes in Computer Science, vol 15340. Springer, Cham. https://doi.org/10.1007/978-3-031-82024-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-82024-3_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-82023-6

  • Online ISBN: 978-3-031-82024-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics