EFN6D: an efficient RGB-D fusion network for 6D pose estimation

Wang, Yaming; Jiang, Xiaoyan; Fujita, Hamido; Fang, Zhijun; Qiu, Xihe; Chen, Jue

doi:10.1007/s12652-022-03874-1

EFN6D: an efficient RGB-D fusion network for 6D pose estimation

Original Research
Published: 23 May 2022

Volume 15, pages 75–88, (2024)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Yaming Wang¹,
Xiaoyan Jiang¹,
Hamido Fujita^2,3,4,
Zhijun Fang¹,
Xihe Qiu¹ &
…
Jue Chen¹

877 Accesses
Explore all metrics

Abstract

Precise 6DoF (6D) object pose estimation is an essential topic for many intelligent applications, for example, robot grasping, virtual reality, and autonomous driving. Lacking depth information, traditional pose estimators using only RGB cameras consistently predict bias 3D rotation and translation matrices. With the wide use of RGB-D cameras, we can directly capture both the depth for the object relative to the camera and the corresponding RGB image. Most existing methods concatenate these two data sources directly, which does not make full use of their complementary relationship. Therefore, we propose an efficient RGB-D fusion network for 6D pose estimation, called EFN6D, to exploit the 2D–3D feature more thoroughly. Instead of directly using the original single-channel depth map, we encode the depth information into a normal map and point cloud data. To effectively fuse the surface texture features and the geometric contour features of the object, we feed the RGB images and the normal map into two ResNets. Besides, the PSP modules and skip connections are used between the two ResNets, which not only enhances cross modal fusion performance of the network but also enhances the network’s capability in handling objects at different scales. Finally, the fused features obtained from these two ResNets and the point cloud features are densely fused point by point to further strengthen the fusion of 2D and 3D information at a per-pixel level. Experiments on the LINEMOD and YCB-Video datasets show that our EFN6D outperforms state-of-the-art methods by a large margin.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A RGB-D feature fusion network for occluded object 6D pose estimation

Article 13 June 2024

DON6D: a decoupled one-stage network for 6D pose estimation

Article Open access 10 April 2024

Generalizable and Accurate 6D Object Pose Estimation Network

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Aubry M, Maturana D, Efros AA, Russell BC, Sivic J (2014) Seeing 3D chairs: exemplar part-based 2D-3D alignment using a large dataset of cad models. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3762–3769. https://doi.org/10.1109/cvpr.2014.487
Besl PJ, McKay ND (1992) Method for registration of 3-D shapes. In: Sensor fusion IV: control paradigms and data structures, vol 1611, p 586–606. https://doi.org/10.1117/12.57955
Bui M, Zakharov S, Albarqouni S, Ilic S, Navab N (2018) When regression meets manifold learning for object recognition and pose estimation. In: IEEE international conference on robotics and automation (ICRA), pp 6140–6146. https://doi.org/10.1109/ICRA.2018.8460654
Charles RQ, Su H, Kaichun M, Guibas LJ (2017) PointNet: deep learning on point sets for 3D classification and segmentation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), p 652–660. https://doi.org/10.1109/CVPR.2017.16
Chen W, Duan J, Basevi H, Chang HJ, Leonardis A (2020) PointPoseNet: point pose network for robust 6d object pose estimation. In: IEEE winter conference on applications of computer vision (WACV), p 2813–2822. https://doi.org/10.1109/WACV45572.2020.9093272
Du G, Wang K, Lian S, Zhao K (2021) Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review. Artif Intell Rev 54(3):1677–1734. https://doi.org/10.1007/s10462-020-09888-5
Article Google Scholar
Eitel A, Springenberg JT, Spinello L, Riedmiller M, Burgard W (2015) Multimodal deep learning for robust RGB-D object recognition. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), p 681–687. https://doi.org/10.1109/iros.2015.7353446
Gao G, Lauri M, Wang Y, Hu X, Zhang J, Frintrop S (2020) 6d object pose regression via supervised learning on point clouds. In: IEEE international conference on robotics and automation (ICRA), p 3643–3649. https://doi.org/10.1109/ICRA40945.2020.9197461
Guo J, Xing X, Quan W, Yan D-M, Gu Q, Liu Y, Zhang X (2021) Efficient center voting for object detection and 6D pose estimation in 3D point cloud. IEEE Trans Image Process 30:5072–5084. https://doi.org/10.1109/TIP.2021.3078109
Article ADS PubMed Google Scholar
Gupta S, Girshick R, Arbeláez P, Malik J (2014)Learning rich features from RGB-D images for object detection and segmentation. In: European conference on computer vision (ECCV), p 345–360. https://doi.org/10.1007/978-3-319-10584-0_23
Hagelskjær F, Buch AG (2020) Pointvotenet: accurate object detection and 6 DOF pose estimation in point clouds. In: IEEE international conference on image processing (ICIP), pp 2641–2645. https://doi.org/10.1109/ICIP40778.2020.9191119
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), p 770–778. https://doi.org/10.1109/CVPR.2016.90
Hinterstoisser S, Holzer S, Cagniart C, Ilic S, Konolige K, Navab N, Lepetit V (2011) Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: IEEE international conference on computer vision (ICCV), p 858–865. https://doi.org/10.1109/ICCV.2011.6126326
Hinterstoisser S, Lepetit V, Ilic S, Holzer S, Bradski G, Konolige K, Navab N (2012) Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Asian conference on computer vision (ACCV), pp 548–562. https://doi.org/10.1007/978-3-642-37331-2_42
Hodan T, Haluza P, Obdržálek Š, Matas J, Lourakis M, Zabulis X (2017) T-LESS: an RGB-D dataset for 6D pose estimation of texture-less objects. In: IEEE winter conference on applications of computer vision (WACV), pp 880–888. https://doi.org/10.1109/WACV.2017.103
Hodaň T, Michel F, Brachmann E, Kehl W, Buch AG, Kraft D, Drost B, Vidal J, Ihrke S, Zabulis X, Sahin C, Manhardt F, Tombari F, Kim T-K, Matas J, Rother C (2018) Bop: benchmark for 6D object pose estimation. In: European conference on computer vision (ECCV), p 19–35. https://doi.org/10.1007/978-3-030-01249-6_2
Hoppe H, DeRose T, Duchamp T, McDonald J, Stuetzle W (1992) Surface reconstruction from unorganized points. In: Proceedings of the 19th annual conference on computer graphics and interactive techniques, pp 71–78. https://doi.org/10.1145/133994.134011
Hua W, Guo J, Wang Y, Xiong R (2020) 3D point-to-keypoint voting network for 6D pose estimation. In: 16th International conference on control, automation, robotics and vision (ICARCV), p 536–541. https://doi.org/10.1109/ICARCV50220.2020.9305322
Hu Y, Fua P, Wang W, Salzmann M (2020) Single-stage 6D object pose estimation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), p 2927–2936. https://doi.org/10.1109/CVPR42600.2020.00300
Hu Y, Hugonot J, Fua P, Salzmann M (2019) Segmentation-driven 6D object pose estimation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), p 3380–3389. https://doi.org/10.1109/CVPR.2019.00350
Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: IEEE international conference on computer vision (ICCV), p 1530–1538. https://doi.org/10.1109/ICCV.2017.169
Kendall A, Grimes M, Cipolla R (2015) PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: IEEE international conference on computer vision (ICCV), p 2938–2946. https://doi.org/10.1109/ICCV.2015.336
Lepetit V, Moreno-Noguer F, Fua P (2009) Epnp: an accurate $O(n)$ solution to the pnp problem. Int J Comput Vis 81(2):155–166. https://doi.org/10.1007/s11263-008-0152-6
Article Google Scholar
Li C, Bai J, Hager GD (2018a) A unified framework for multi-view multi-class object pose estimation. In: European conference on computer vision (ECCV), p 263–281. https://doi.org/10.1007/978-3-030-01270-0_16
Li Y, Wang G, Ji X, Xiang Y, Fox D (2018b) DeepIM: deep iterative matching for 6D pose estimation. In: European conference on computer vision (ECCV), p 695–711. https://doi.org/10.1007/978-3-030-01231-1_42
Li Z, Wang G, Ji X (2019) CDPN: coordinates-based disentangled pose network for real-time RGB-based 6-DoF object pose estimation. In: IEEE/CVF international conference on computer vision (ICCV), p 7677–7686. https://doi.org/10.1109/ICCV.2019.00777
Li Q, Hu R, Xiao J, Wang Z, Chen Y (2020) Learning latent geometric consistency for 6D object pose estimation in heavily cluttered scenes. J Vis Commun Image Represent. https://doi.org/10.1016/j.jvcir.2020.102790
Article Google Scholar
Li Y, Ma L, Zhong Z, Liu F, Chapman MA, Cao D, Li J (2021) Deep learning for LiDAR point clouds in autonomous driving: a review. IEEE Trans Neural Netw Learn Syst 32(8):3412–3432. https://doi.org/10.1109/TNNLS.2020.3015992
Article PubMed Google Scholar
Makhataeva Z, Varol HA (2020) Augmented reality for robotics: a review. Robotics 9(2):21. https://doi.org/10.3390/robotics9020021
Article Google Scholar
Park K, Patten T, Vincze M (2019) Pix2pose: pixel-wise coordinate regression of objects for 6D pose estimation. In: IEEE/CVF international conference on computer vision (ICCV), pp 7667–7676. https://doi.org/10.1109/ICCV.2019.00776
Peng S, Liu Y, Huang Q, Zhou X, Bao H (2019) PVNet: pixel-wise voting network for 6dof pose estimation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4556–4565. https://doi.org/10.1109/CVPR.2019.00469
Qi CR, Su H, Nießner M, Dai A, Yan M, Guibas LJ (2016) Volumetric and multi-view CNNS for object classification on 3D data. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5648–5656. https://doi.org/10.1109/cvpr.2016.609
Qi C. R, Yi L, Su H, Guibas LJ (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the 31st international conference on neural information processing systems (NIPS), pp 5105–5114. https://doi.org/10.1109/cvpr.2017.16
Rad M, Lepetit V (2017) BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: IEEE international conference on computer vision (ICCV), p 3848–3856. https://doi.org/10.1109/ICCV.2017.413
Saadi L, Besbes B, Kramm S, Bensrhair A (2021) Optimizing RGB-D fusion for accurate 6D of pose estimation. IEEE Robot Autom Lett 6(2):2413–2420. https://doi.org/10.1109/LRA.2021.3061347
Article Google Scholar
Shi Y, Huang J, Xu X, Zhang Y, Xu K (2021) StablePose: learning 6D object poses from geometrically stable patches. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), p 15222–15231. https://doi.org/10.1109/CVPR46437.2021.01497
Shin Y, Balasingham I (2017) Comparison of hand-craft feature based SVM and CNN based deep learning framework for automatic polyp classification. In: 39th Annual international conference of the IEEE engineering in medicine and biology society (EMBC), p 3277–3280. https://doi.org/10.1109/embc.2017.8037556
Sock J, Kim KI, Sahin C, Kim TK (2018) Multi-task deep networks for depth-based 6D object pose and joint registration in crowd scenarios. In: Proceedings of of British machine vision conference (BMVC)
Song C, Song J, Huang Q (2020) HybridPose: 6D object pose estimation under hybrid representations. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 428–437. https://doi.org/10.1109/CVPR42600.2020.00051
Su Y, Rambach J, Minaskan N, Lesur P, Pagani A, Stricker D (2019) Deep multi-state object pose estimation for augmented reality assembly. In: IEEE international symposium on mixed and augmented reality adjunct (ISMAR-Adjunct), p 222–227. https://doi.org/10.1109/ISMAR-Adjunct.2019.00-42
Tekin B, Sinha SN, Fua P (2018) Real-time seamless single shot 6D object pose prediction. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 292–301. https://doi.org/10.1109/CVPR.2018.00038
Tian M, Pan L, Ang MH, Hee Lee G (2020) Robust 6D object pose estimation by learning RGB-D features. In: IEEE international conference on robotics and automation (ICRA), p 6218–6224. https://doi.org/10.1109/ICRA40945.2020.9197555
Tremblay J, To T, Sundaralingam B, Xiang Y, Fox D, Birchfield S (2018) Deep object pose estimation for semantic robotic grasping of household objects. In: Proceedings of the 2nd conference on robot learning, volume 87 of proceedings of machine learning research, p 306–316
Wang G, Manhardt F, Shao J, Ji X, Navab N, Tombari F (2020) Self6D: self-supervised monocular 6D object pose estimation. In: European conference on computer vision (ECCV), p 108–125. https://doi.org/10.1007/978-3-030-58452-8_7
Wang C, Xu D, Zhu Y, Martin-Martin R, Lu C, Fei-Fei L, Savarese S (2019) DenseFusion: 6D object pose estimation by iterative dense fusion. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), p 3338–3347. https://doi.org/10.1109/CVPR.2019.00346
Xiang Y, Schmidt T, Narayanan V, Fox D (2018) PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. In: Robotics: science and systems (RSS). https://doi.org/10.15607/rss.2018.xiv.019
Xu D, Anguelov D, Jain A (2018) PointFusion: deep sensor fusion for 3D bounding box estimation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), p 244–253. https://doi.org/10.1109/CVPR.2018.00033
Yuan Y, Wan J, Wang Q (2016) Congested scene classification via efficient unsupervised feature learning and density estimation. Pattern Recognit 56:159–169. https://doi.org/10.1016/j.patcog.2016.03.020
Article ADS Google Scholar
Zakharov S, Shugurov I, Ilic S (2019) DPOD: 6D pose object detector and refiner. In: IEEE/CVF international conference on computer vision (ICCV), p 1941–1950. https://doi.org/10.1109/ICCV.2019.00203
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: IEEE conference on computer vision and pattern recognition (CVPR), p 6230–6239. https://doi.org/10.1109/CVPR.2017.660
Zhou Y, Tuzel O (2018) VoxelNet: end-to-end learning for point cloud based 3D object detection. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), p 4490–4499. https://doi.org/10.1109/CVPR.2018.00472

Download references

Acknowledgements

The work is supported by the following projects: National Natural Science Foundation of China (NSFC) Essential project, Nr.: U2033218, 61831018; NSFC Nr.: 61772328

Author information

Authors and Affiliations

School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, 201620, China
Yaming Wang, Xiaoyan Jiang, Zhijun Fang, Xihe Qiu & Jue Chen
i-SOMET Incorporated Association, Morioka, Iwate, Japan
Hamido Fujita
Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, Spain
Hamido Fujita
College of Mathematical Sciences, Harbin Engineering University, Harbin, 150001, China
Hamido Fujita

Authors

Yaming Wang
View author publications
You can also search for this author inPubMed Google Scholar
Xiaoyan Jiang
View author publications
You can also search for this author inPubMed Google Scholar
Hamido Fujita
View author publications
You can also search for this author inPubMed Google Scholar
Zhijun Fang
View author publications
You can also search for this author inPubMed Google Scholar
Xihe Qiu
View author publications
You can also search for this author inPubMed Google Scholar
Jue Chen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Xiaoyan Jiang or Zhijun Fang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Y., Jiang, X., Fujita, H. et al. EFN6D: an efficient RGB-D fusion network for 6D pose estimation. J Ambient Intell Human Comput 15, 75–88 (2024). https://doi.org/10.1007/s12652-022-03874-1

Download citation

Received: 04 October 2021
Accepted: 27 April 2022
Published: 23 May 2022
Issue Date: January 2024
DOI: https://doi.org/10.1007/s12652-022-03874-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EFN6D: an efficient RGB-D fusion network for 6D pose estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A RGB-D feature fusion network for occluded object 6D pose estimation

DON6D: a decoupled one-stage network for 6D pose estimation

Generalizable and Accurate 6D Object Pose Estimation Network

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now