Skip to main content

Advertisement

Log in

Sca-pose: category-level 6D pose estimation with adaptive shape prior based on CNN and graph convolution

  • Original Research Paper
  • Published:
Intelligent Service Robotics Aims and scope Submit manuscript

Abstract

Category-level 6D pose estimation aims to accurately predict the spatial position, orientation and scale of unseen objects belonging to a specific category. Existing methods often fall into two categories: prior-based approaches, which typically utilize the Umeyama algorithm and achieve high accuracy but suffer from training limitations and computational overhead, and end-to-end methods, which offer efficient training but often underperform due to a lack of category-specific prior knowledge. To bridge this gap, we propose a novel framework SCA-Pose to leverage the advantages of both approaches. SCA-Pose consists of one main network for efficient inference and an auxiliary network for enhanced accuracy. The main network, featuring local and global feature fusion modules (CNN and HS-Net) and a trainable pose regressor, enables end-to-end learning for real-time applications. The auxiliary network further refines the pose prediction by incorporating intrinsic geometric consistency constraints between Normalized Object Coordinate Space (NOCS) coordinates and object pose and size, while adaptively accounting for intra-class shape variations. Experimental results on the REAL275 and CAMERA25 datasets show that SCA-Pose has significant performance improvement compared to the existing baseline method (RBP-Pose) and can achieve real-time operation (30FPS).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Arsalan M, Clemens E, Dieter F (2019) 6-dof graspnet: Variational grasp generation for object manipulation. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 2901–2910

  2. Deng X, Yu X, Arsalan M, Clemens E, Timothy B, Dieter F (2020) Self-supervised 6d object pose estimation for robot manipulation. In 2020 IEEE international conference on robotics and automation (ICRA), pp. 3665–3671. IEEE,

  3. Yongzhi S, Jason R, Nareg M, Paul L, Alain P, Didier S (2019) Deep multi-state object pose estimation for augmented reality assembly. In 2019 IEEE International symposium on mixed and augmented reality adjunct (ISMAR-Adjunct), pp. 222–227. IEEE,

  4. Cheng Z, Zhaopeng C, Yinda Z, Bing Z, Marc P, Shuaicheng L (2021) Holistic 3d scene understanding from a single image with implicit representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8833–8842

  5. Yuan Zhenxun, Song Xiao, Bai Lei, Wang Zhe, Ouyang Wanli (2021) Temporal-channel transformer for 3d lidar-based video object detection for autonomous driving. IEEE Trans Circuits Syst Video Technol 32(4):2068–2078

    MATH  Google Scholar 

  6. Gorschlüter Felix, Rojtberg Pavel, Pöllabauer Thomas (2022) A survey of 6d object detection based on 3d models for industrial applications. J Imaging 8(3):53

    Google Scholar 

  7. Linfang Z, Chen W, Yinghan S, Esha D, Hua C, Aleš L, Wei Z, Jin CH (2023) Hs-pose: hybrid scope feature extraction for category-level object pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.17163–17173

  8. He W, Srinath S, Jingwei H, Julien V, Shuran S, Leonidas JG (2019) Normalized object coordinate space for category-level 6d object pose and size estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2642–2651,

  9. Umeyama Shinji (1991) Least-squares estimation of transformation parameters between two point patterns. IEEE Trans Pattern Anal Mach Intell 13(04):376–380

    MATH  Google Scholar 

  10. Meng T, Marcelo HA, Hee LG (2020) Shape prior deformation for categorical 6d object pose and size estimation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16, pp. 530–546. Springer,

  11. Kai C, Qi D (2021) SGPA: structure-guided prior adaptation for category-level 6d object pose estimation. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 2773–2782

  12. Kang Wenchao, Xiang Yuming, Wang Feng, You Hongjian (2021) Do-net: dual-output network for land cover classification from optical remote sensing images. IEEE Geosci Remote Sens Lett 19:1–5

    Google Scholar 

  13. Li Yunan, Wan Jun, Miao Qiguang, Escalera Sergio, Fang Huijuan, Chen Huizhou, Qi Xiangda, Guo Guodong (2020) Cr-net: a deep classification-regression network for multimodal apparent personality analysis. Int J Comput Vision 128:2763–2780

    MATH  Google Scholar 

  14. Ruida Z, Yan D, Zhiqiang L, Fabian M, Federico T, Xiangyang J (2022) Rbp-pose: residual bounding box projection for category-level pose estimation. In European conference on computer vision, pp. 655–672. Springer

  15. Wei C, Xi J, Hyung Jin C, Jinming D, Linlin S, Ales L (2021) Fs-net: fast shape-based network for category-level 6d object pose estimation with decoupled rotation mechanism. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1581–1590

  16. Jiehong L, Zewei W, Zhihao L, Songcen X, Kui J, Yuanqing L (2021) Dualposenet: category-level 6d object pose and size estimation using dual pose network with refined learning of pose consistency. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 3560–3569

  17. Jiehong L, Zewei W, Yabin Z, Kui J (2023) Vi-net: boosting category-level 6d object pose estimation via learning decoupled rotations on the spherical representations. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 14001–14011

  18. Guowei L, Dongchen Z, Guanghui Z, Wenjun S, Tianyu Z, Xiaolin Z, Jiamao L (2023) Sd-pose: structural discrepancy aware category-level 6d object pose estimation. In Proceedings of the IEEE/CVF Winter Conference on applications of computer vision, pp. 5685–5694

  19. Zou Lu, Huang Zhangjin, Naijie Gu, Wang Guoping (2024) Learning geometric consistency and discrepancy for category-level 6d object pose estimation from point clouds. Pattern Recognit 145:109896

    MATH  Google Scholar 

  20. Ruida Z, Yan D, Fabian M, Federico T, Xiangyang J (2022) Ssp-pose: Symmetry-aware shape prior deformation for direct category-level object pose estimation. In 2022 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 7452–7459. IEEE,

  21. Zhi-Hao L, Sheng-Yu H, Yu-Chiang W Frank (2020) Convolution in the cloud: learning deformable kernels in 3d graph convolution networks for point cloud analysis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1800–1809

  22. Liu Jierui, Cao Zhiqiang, Tang Yingbo, Liu Xilong, Tan Min (2022) Category-level 6d object pose estimation with structure encoder and reasoning attention. IEEE Trans Circuits Syst Video Technol 32(10):6728–6740

    MATH  Google Scholar 

  23. Yan D, Ruida Z, Zhiqiang L, Fabian M, Xiangyang J, Federico NNT (2022) Gpv-pose: category-level object pose estimation via geometry-guided point-wise voting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6781–6791

  24. Kinga D, Ba AJ et al (2015) A method for stochastic optimization. In International conference on learning representations (ICLR), 5, 6. San Diego, California;,

  25. Jian L, Wei S, Chongpei L, Xing Z, Qiang F (2023) Robotic continuous grasping system by shape transformer-guided multi-object category-level 6d pose estimation. IEEE Trans Ind Inform 19(11):11171–11181

    MATH  Google Scholar 

  26. Ali Farhadi, Joseph Redmon (2018) Yolov3: an incremental improvement. Computer vision and pattern recognition, vol 1804. Springer, Berlin, pp 1–6

    MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Nature Science Foundation of China (62373016) and the Open Projects Program of State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS-2023-22).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoyu Zuo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zuo, G., Yu, S., Yu, S. et al. Sca-pose: category-level 6D pose estimation with adaptive shape prior based on CNN and graph convolution. Intel Serv Robotics 18, 351–361 (2025). https://doi.org/10.1007/s11370-025-00587-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11370-025-00587-0

Keywords