Sca-pose: category-level 6D pose estimation with adaptive shape prior based on CNN and graph convolution

Zuo, Guoyu; Yu, Shan; Yu, Shuangyue; Liu, Hong; Zhao, Min

doi:10.1007/s11370-025-00587-0

Sca-pose: category-level 6D pose estimation with adaptive shape prior based on CNN and graph convolution

Original Research Paper
Published: 12 February 2025

Volume 18, pages 351–361, (2025)
Cite this article

Intelligent Service Robotics Aims and scope Submit manuscript

Guoyu Zuo ORCID: orcid.org/0000-0002-7624-4728^1,2,
Shan Yu^1,2,
Shuangyue Yu^1,2,
Hong Liu^1,2 &
…
Min Zhao^1,2

179 Accesses
Explore all metrics

Abstract

Category-level 6D pose estimation aims to accurately predict the spatial position, orientation and scale of unseen objects belonging to a specific category. Existing methods often fall into two categories: prior-based approaches, which typically utilize the Umeyama algorithm and achieve high accuracy but suffer from training limitations and computational overhead, and end-to-end methods, which offer efficient training but often underperform due to a lack of category-specific prior knowledge. To bridge this gap, we propose a novel framework SCA-Pose to leverage the advantages of both approaches. SCA-Pose consists of one main network for efficient inference and an auxiliary network for enhanced accuracy. The main network, featuring local and global feature fusion modules (CNN and HS-Net) and a trainable pose regressor, enables end-to-end learning for real-time applications. The auxiliary network further refines the pose prediction by incorporating intrinsic geometric consistency constraints between Normalized Object Coordinate Space (NOCS) coordinates and object pose and size, while adaptively accounting for intra-class shape variations. Experimental results on the REAL275 and CAMERA25 datasets show that SCA-Pose has significant performance improvement compared to the existing baseline method (RBP-Pose) and can achieve real-time operation (30FPS).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RBP-Pose: Residual Bounding Box Projection for Category-Level Pose Estimation

ESD-Pose: Enhanced Semantic Discrimination for Generalizable 6D Pose Estimation

Trans6D: Transformer-Based 6D Object Pose Estimation and Refinement

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Arsalan M, Clemens E, Dieter F (2019) 6-dof graspnet: Variational grasp generation for object manipulation. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 2901–2910
Deng X, Yu X, Arsalan M, Clemens E, Timothy B, Dieter F (2020) Self-supervised 6d object pose estimation for robot manipulation. In 2020 IEEE international conference on robotics and automation (ICRA), pp. 3665–3671. IEEE,
Yongzhi S, Jason R, Nareg M, Paul L, Alain P, Didier S (2019) Deep multi-state object pose estimation for augmented reality assembly. In 2019 IEEE International symposium on mixed and augmented reality adjunct (ISMAR-Adjunct), pp. 222–227. IEEE,
Cheng Z, Zhaopeng C, Yinda Z, Bing Z, Marc P, Shuaicheng L (2021) Holistic 3d scene understanding from a single image with implicit representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8833–8842
Yuan Zhenxun, Song Xiao, Bai Lei, Wang Zhe, Ouyang Wanli (2021) Temporal-channel transformer for 3d lidar-based video object detection for autonomous driving. IEEE Trans Circuits Syst Video Technol 32(4):2068–2078
MATH Google Scholar
Gorschlüter Felix, Rojtberg Pavel, Pöllabauer Thomas (2022) A survey of 6d object detection based on 3d models for industrial applications. J Imaging 8(3):53
Google Scholar
Linfang Z, Chen W, Yinghan S, Esha D, Hua C, Aleš L, Wei Z, Jin CH (2023) Hs-pose: hybrid scope feature extraction for category-level object pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.17163–17173
He W, Srinath S, Jingwei H, Julien V, Shuran S, Leonidas JG (2019) Normalized object coordinate space for category-level 6d object pose and size estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2642–2651,
Umeyama Shinji (1991) Least-squares estimation of transformation parameters between two point patterns. IEEE Trans Pattern Anal Mach Intell 13(04):376–380
MATH Google Scholar
Meng T, Marcelo HA, Hee LG (2020) Shape prior deformation for categorical 6d object pose and size estimation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16, pp. 530–546. Springer,
Kai C, Qi D (2021) SGPA: structure-guided prior adaptation for category-level 6d object pose estimation. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 2773–2782
Kang Wenchao, Xiang Yuming, Wang Feng, You Hongjian (2021) Do-net: dual-output network for land cover classification from optical remote sensing images. IEEE Geosci Remote Sens Lett 19:1–5
Google Scholar
Li Yunan, Wan Jun, Miao Qiguang, Escalera Sergio, Fang Huijuan, Chen Huizhou, Qi Xiangda, Guo Guodong (2020) Cr-net: a deep classification-regression network for multimodal apparent personality analysis. Int J Comput Vision 128:2763–2780
MATH Google Scholar
Ruida Z, Yan D, Zhiqiang L, Fabian M, Federico T, Xiangyang J (2022) Rbp-pose: residual bounding box projection for category-level pose estimation. In European conference on computer vision, pp. 655–672. Springer
Wei C, Xi J, Hyung Jin C, Jinming D, Linlin S, Ales L (2021) Fs-net: fast shape-based network for category-level 6d object pose estimation with decoupled rotation mechanism. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1581–1590
Jiehong L, Zewei W, Zhihao L, Songcen X, Kui J, Yuanqing L (2021) Dualposenet: category-level 6d object pose and size estimation using dual pose network with refined learning of pose consistency. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 3560–3569
Jiehong L, Zewei W, Yabin Z, Kui J (2023) Vi-net: boosting category-level 6d object pose estimation via learning decoupled rotations on the spherical representations. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 14001–14011
Guowei L, Dongchen Z, Guanghui Z, Wenjun S, Tianyu Z, Xiaolin Z, Jiamao L (2023) Sd-pose: structural discrepancy aware category-level 6d object pose estimation. In Proceedings of the IEEE/CVF Winter Conference on applications of computer vision, pp. 5685–5694
Zou Lu, Huang Zhangjin, Naijie Gu, Wang Guoping (2024) Learning geometric consistency and discrepancy for category-level 6d object pose estimation from point clouds. Pattern Recognit 145:109896
MATH Google Scholar
Ruida Z, Yan D, Fabian M, Federico T, Xiangyang J (2022) Ssp-pose: Symmetry-aware shape prior deformation for direct category-level object pose estimation. In 2022 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 7452–7459. IEEE,
Zhi-Hao L, Sheng-Yu H, Yu-Chiang W Frank (2020) Convolution in the cloud: learning deformable kernels in 3d graph convolution networks for point cloud analysis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1800–1809
Liu Jierui, Cao Zhiqiang, Tang Yingbo, Liu Xilong, Tan Min (2022) Category-level 6d object pose estimation with structure encoder and reasoning attention. IEEE Trans Circuits Syst Video Technol 32(10):6728–6740
MATH Google Scholar
Yan D, Ruida Z, Zhiqiang L, Fabian M, Xiangyang J, Federico NNT (2022) Gpv-pose: category-level object pose estimation via geometry-guided point-wise voting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6781–6791
Kinga D, Ba AJ et al (2015) A method for stochastic optimization. In International conference on learning representations (ICLR), 5, 6. San Diego, California;,
Jian L, Wei S, Chongpei L, Xing Z, Qiang F (2023) Robotic continuous grasping system by shape transformer-guided multi-object category-level 6d pose estimation. IEEE Trans Ind Inform 19(11):11171–11181
MATH Google Scholar
Ali Farhadi, Joseph Redmon (2018) Yolov3: an incremental improvement. Computer vision and pattern recognition, vol 1804. Springer, Berlin, pp 1–6
MATH Google Scholar

Download references

Acknowledgements

This work was supported by the National Nature Science Foundation of China (62373016) and the Open Projects Program of State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS-2023-22).

Author information

Authors and Affiliations

School of Information Science and Technology, Beijing University of Technology, Beijing, 100124, China
Guoyu Zuo, Shan Yu, Shuangyue Yu, Hong Liu & Min Zhao
Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, 100124, China
Guoyu Zuo, Shan Yu, Shuangyue Yu, Hong Liu & Min Zhao

Authors

Guoyu Zuo
View author publications
You can also search for this author inPubMed Google Scholar
Shan Yu
View author publications
You can also search for this author inPubMed Google Scholar
Shuangyue Yu
View author publications
You can also search for this author inPubMed Google Scholar
Hong Liu
View author publications
You can also search for this author inPubMed Google Scholar
Min Zhao
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Guoyu Zuo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zuo, G., Yu, S., Yu, S. et al. Sca-pose: category-level 6D pose estimation with adaptive shape prior based on CNN and graph convolution. Intel Serv Robotics 18, 351–361 (2025). https://doi.org/10.1007/s11370-025-00587-0

Download citation

Received: 08 March 2024
Accepted: 11 January 2025
Published: 12 February 2025
Issue Date: March 2025
DOI: https://doi.org/10.1007/s11370-025-00587-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sca-pose: category-level 6D pose estimation with adaptive shape prior based on CNN and graph convolution

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

RBP-Pose: Residual Bounding Box Projection for Category-Level Pose Estimation

ESD-Pose: Enhanced Semantic Discrimination for Generalizable 6D Pose Estimation

Trans6D: Transformer-Based 6D Object Pose Estimation and Refinement

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now