Abstract
Category-level object pose estimation plays a crucial role in a wide range of practical applications by accurately predicting the poses and sizes of unseen objects within a specific category. However, accurately estimating object poses remains a significant challenge due to substantial shape variations within the same category. To address this issue, this paper introduces a novel learning network for object pose estimation that is guided by a shape descriptor. By capturing the geometric information of an object’s shape, the shape descriptor provides valuable input for subsequent feature learning, effectively handling shape variations. Moreover, our framework incorporates a confidence-based pose estimator, which assigns confidence scores to each pose prediction. This integration allows for the acquisition of more accurate poses with higher confidence by penalizing poses with low confidence. Experimental results on the CAMERA25 and REAL275 datasets demonstrate the superiority of our approach over state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, D., Li, J., Wang, Z., Xu, K.: Learning canonical shape space for category-level 6D object pose and size estimation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11973–11982 (2020)
Chen, K., Dou, Q.: SGPA: structure-guided prior adaptation for category-level 6d object pose estimation. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2753–2762 (2021)
Chen, W., Jia, X., Chang, H.J., Duan, J., Shen, L., Leonardis, A.: FS-Net: fast shape-based network for category-level 6d object pose estimation with decoupled rotation mechanism. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1581–1590 (2021)
Di, Y., et al.: GPV-pose: category-level object pose estimation via geometry-guided point-wise voting, pp. 6771–6781 (2022)
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. In: Communications of the ACM, pp. 381–395 (1981)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE Computer Society, Los Alamitos, CA, USA (2016)
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 42, 386–397 (2020)
He, Y., Huang, H., Fan, H., Chen, Q., Sun, J.: FFB6D: a full flow bidirectional fusion network for 6d pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3002–3012 (2021)
He, Y., Sun, W., Huang, H., Liu, J., Fan, H., Sun, J.: PVN3D: a deep point-wise 3D keypoints voting network for 6dof pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11629–11638 (2020)
Lee, T., et al.: UDA-COPE: unsupervised domain adaptation for category-level object pose estimation. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14871–14880. IEEE Computer Society, Los Alamitos, CA, USA (2022)
Li, G., et al.: Generative category-level shape and pose estimation with semantic primitives. In: 6th Annual Conference on Robot Learning (2022)
Li, Z., Wang, G., Ji, X.: CDPN: coordinates-based disentangled pose network for real-time RGB-based 6-DoF object pose estimation. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7677–7686 (2019)
Lin, H., Liu, Z., Cheang, C., Fu, Y., Guo, G., Xue, X.: SAR-Net: shape alignment and recovery network for category-level 6D object pose and size estimation. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6697–6707 (2022)
Lin, H., Liu, Z., Cheang, C.H., Zhang, L., Fu, Y., Xue, X.: DONet: learning category-level 6D object pose and size estimation from depth observation. ArXiv abs/2106.14193 (2021)
Lin, J., Wei, Z., Li, Z., Xu, S., Jia, K., Li, Y.: DualPoseNet: category-level 6d object pose and size estimation using dual pose network with refined learning of pose consistency, pp. 3540–3549 (2021)
Lin, J., Wei, Z., Ding, C., Jia, K.: Category-level 6d object pose and size estimation using self-supervised deep prior deformation networks, pp. 19–34 (2022)
Lin, Z.H., Huang, S.Y., Wang, Y.C.F.: Convolution in the cloud: learning deformable kernels in 3d graph convolution networks for point cloud analysis. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1800–1809 (2020)
Liu, C., et al.: Adaptive smooth L1 loss: a better way to regress scene texts with extreme aspect ratios. In: 2021 IEEE Symposium on Computers and Communications (ISCC), pp. 1–7 (2021)
Liu, J., Chen, Y., Ye, X., Qi, X.: IST-Net: prior-free category-level pose estimation with implicit space transformation, pp. 13932–13942 (2023)
Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: PVNet: pixel-wise voting network for 6DoF pose estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4556–4565 (2019)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: Deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Sahin, C., Kim, T.: Category-level 6D object pose recovery in depth images. In: Computer Vision – ECCV 2018 Workshops, vol. 11129, pp. 665–681 (2018)
Wang, C., et al.: DenseFusion: 6D object pose estimation by iterative dense fusion. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3338–3347 (2019)
Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2637–2646 (Jun 2019)
Wang, J., Chen, K., Dou, Q.: Category-level 6D object pose estimation via cascaded relation and recurrent reconstruction networks. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4807–4814 (2021)
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6d object pose estimation in cluttered scenes. In: Robotics: Science and Systems XIV (2018)
Zhang, R., Di, Y., Lou, Z., Manhardt, F., Tombari, F., Ji, X.: RBP-Pose: residual bounding box projection for category-level pose estimation, pp. 655–672 (2022)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6230–6239 (2017)
Acknowledgements
The work described in this paper was fully supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (No. UGC/FDS16/E14/21).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, Y. et al. (2025). Shape Descriptor Guided Learning for Category-Level Object Pose Estimation. In: Magnenat-Thalmann, N., Kim, J., Sheng, B., Deng, Z., Thalmann, D., Li, P. (eds) Advances in Computer Graphics. CGI 2024. Lecture Notes in Computer Science, vol 15340. Springer, Cham. https://doi.org/10.1007/978-3-031-82024-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-82024-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-82023-6
Online ISBN: 978-3-031-82024-3
eBook Packages: Computer ScienceComputer Science (R0)