Abstract
Category-level 6D pose estimation aims to accurately predict the spatial position, orientation and scale of unseen objects belonging to a specific category. Existing methods often fall into two categories: prior-based approaches, which typically utilize the Umeyama algorithm and achieve high accuracy but suffer from training limitations and computational overhead, and end-to-end methods, which offer efficient training but often underperform due to a lack of category-specific prior knowledge. To bridge this gap, we propose a novel framework SCA-Pose to leverage the advantages of both approaches. SCA-Pose consists of one main network for efficient inference and an auxiliary network for enhanced accuracy. The main network, featuring local and global feature fusion modules (CNN and HS-Net) and a trainable pose regressor, enables end-to-end learning for real-time applications. The auxiliary network further refines the pose prediction by incorporating intrinsic geometric consistency constraints between Normalized Object Coordinate Space (NOCS) coordinates and object pose and size, while adaptively accounting for intra-class shape variations. Experimental results on the REAL275 and CAMERA25 datasets show that SCA-Pose has significant performance improvement compared to the existing baseline method (RBP-Pose) and can achieve real-time operation (30FPS).









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Arsalan M, Clemens E, Dieter F (2019) 6-dof graspnet: Variational grasp generation for object manipulation. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 2901–2910
Deng X, Yu X, Arsalan M, Clemens E, Timothy B, Dieter F (2020) Self-supervised 6d object pose estimation for robot manipulation. In 2020 IEEE international conference on robotics and automation (ICRA), pp. 3665–3671. IEEE,
Yongzhi S, Jason R, Nareg M, Paul L, Alain P, Didier S (2019) Deep multi-state object pose estimation for augmented reality assembly. In 2019 IEEE International symposium on mixed and augmented reality adjunct (ISMAR-Adjunct), pp. 222–227. IEEE,
Cheng Z, Zhaopeng C, Yinda Z, Bing Z, Marc P, Shuaicheng L (2021) Holistic 3d scene understanding from a single image with implicit representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8833–8842
Yuan Zhenxun, Song Xiao, Bai Lei, Wang Zhe, Ouyang Wanli (2021) Temporal-channel transformer for 3d lidar-based video object detection for autonomous driving. IEEE Trans Circuits Syst Video Technol 32(4):2068–2078
Gorschlüter Felix, Rojtberg Pavel, Pöllabauer Thomas (2022) A survey of 6d object detection based on 3d models for industrial applications. J Imaging 8(3):53
Linfang Z, Chen W, Yinghan S, Esha D, Hua C, Aleš L, Wei Z, Jin CH (2023) Hs-pose: hybrid scope feature extraction for category-level object pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.17163–17173
He W, Srinath S, Jingwei H, Julien V, Shuran S, Leonidas JG (2019) Normalized object coordinate space for category-level 6d object pose and size estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2642–2651,
Umeyama Shinji (1991) Least-squares estimation of transformation parameters between two point patterns. IEEE Trans Pattern Anal Mach Intell 13(04):376–380
Meng T, Marcelo HA, Hee LG (2020) Shape prior deformation for categorical 6d object pose and size estimation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16, pp. 530–546. Springer,
Kai C, Qi D (2021) SGPA: structure-guided prior adaptation for category-level 6d object pose estimation. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 2773–2782
Kang Wenchao, Xiang Yuming, Wang Feng, You Hongjian (2021) Do-net: dual-output network for land cover classification from optical remote sensing images. IEEE Geosci Remote Sens Lett 19:1–5
Li Yunan, Wan Jun, Miao Qiguang, Escalera Sergio, Fang Huijuan, Chen Huizhou, Qi Xiangda, Guo Guodong (2020) Cr-net: a deep classification-regression network for multimodal apparent personality analysis. Int J Comput Vision 128:2763–2780
Ruida Z, Yan D, Zhiqiang L, Fabian M, Federico T, Xiangyang J (2022) Rbp-pose: residual bounding box projection for category-level pose estimation. In European conference on computer vision, pp. 655–672. Springer
Wei C, Xi J, Hyung Jin C, Jinming D, Linlin S, Ales L (2021) Fs-net: fast shape-based network for category-level 6d object pose estimation with decoupled rotation mechanism. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1581–1590
Jiehong L, Zewei W, Zhihao L, Songcen X, Kui J, Yuanqing L (2021) Dualposenet: category-level 6d object pose and size estimation using dual pose network with refined learning of pose consistency. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 3560–3569
Jiehong L, Zewei W, Yabin Z, Kui J (2023) Vi-net: boosting category-level 6d object pose estimation via learning decoupled rotations on the spherical representations. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 14001–14011
Guowei L, Dongchen Z, Guanghui Z, Wenjun S, Tianyu Z, Xiaolin Z, Jiamao L (2023) Sd-pose: structural discrepancy aware category-level 6d object pose estimation. In Proceedings of the IEEE/CVF Winter Conference on applications of computer vision, pp. 5685–5694
Zou Lu, Huang Zhangjin, Naijie Gu, Wang Guoping (2024) Learning geometric consistency and discrepancy for category-level 6d object pose estimation from point clouds. Pattern Recognit 145:109896
Ruida Z, Yan D, Fabian M, Federico T, Xiangyang J (2022) Ssp-pose: Symmetry-aware shape prior deformation for direct category-level object pose estimation. In 2022 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 7452–7459. IEEE,
Zhi-Hao L, Sheng-Yu H, Yu-Chiang W Frank (2020) Convolution in the cloud: learning deformable kernels in 3d graph convolution networks for point cloud analysis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1800–1809
Liu Jierui, Cao Zhiqiang, Tang Yingbo, Liu Xilong, Tan Min (2022) Category-level 6d object pose estimation with structure encoder and reasoning attention. IEEE Trans Circuits Syst Video Technol 32(10):6728–6740
Yan D, Ruida Z, Zhiqiang L, Fabian M, Xiangyang J, Federico NNT (2022) Gpv-pose: category-level object pose estimation via geometry-guided point-wise voting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6781–6791
Kinga D, Ba AJ et al (2015) A method for stochastic optimization. In International conference on learning representations (ICLR), 5, 6. San Diego, California;,
Jian L, Wei S, Chongpei L, Xing Z, Qiang F (2023) Robotic continuous grasping system by shape transformer-guided multi-object category-level 6d pose estimation. IEEE Trans Ind Inform 19(11):11171–11181
Ali Farhadi, Joseph Redmon (2018) Yolov3: an incremental improvement. Computer vision and pattern recognition, vol 1804. Springer, Berlin, pp 1–6
Acknowledgements
This work was supported by the National Nature Science Foundation of China (62373016) and the Open Projects Program of State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS-2023-22).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zuo, G., Yu, S., Yu, S. et al. Sca-pose: category-level 6D pose estimation with adaptive shape prior based on CNN and graph convolution. Intel Serv Robotics 18, 351–361 (2025). https://doi.org/10.1007/s11370-025-00587-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11370-025-00587-0