6D Object Pose Estimation with Attention Aware Bi-gated Fusion

Wang, Laichao; Lu, Weiding; Tian, Yuan; Guan, Yong; Shao, Zhenzhou; Shi, Zhiping

doi:10.1007/978-981-99-8082-6_44

Laichao Wang^12,13,
Weiding Lu^12,13,
Yuan Tian¹⁴,
Yong Guan^12,13,
Zhenzhou Shao^12,13 &
…
Zhiping Shi¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14448))

Included in the following conference series:

International Conference on Neural Information Processing

1198 Accesses

Abstract

Accurate object pose estimation is a prerequisite for successful robotic grasping tasks. Currently keypoint-based pose estimation methods using RGB-D data have shown promising results in simple environments. However, how to fuse the complementary features from RGB-D data is still a challenging task. To this end, this paper proposes a two-branch network with attention aware bi-gated fusion (A2BF) module for the keypoint-based 6D object pose estimation, named A2BNet for abbreviation. A2BF module consists of two key components, bidirectional gated fusion and attention mechanism modules to effectively extract information from both RGB and point cloud data, prioritizing crucial details while disregarding irrelevant information. Several A2BF modules can be embedded in the network to generate complementary texture and geometric information. Extensive experiments are conducted on the public LineMOD and Occlusion LineMOD datasets. Experimental results demonstrate that the average accuracy using the proposed method on both datasets can reach 99.8% and 67.6% respectively, outperforms the state-of-the-art methods.

Supported by the Natural Science Foundation of China (62272322, 62002246, 62272323) and the Project of Beijing Municipal Education Commission (KM202010028010) and Applied Basic Research Project of Liaoning Province(2022JH2/101300279).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zhang, H., Tang, J., Sun, S., et al.: Robotic grasping from classical to modern: a survey. arXiv preprint arXiv:2202.03631 (2022)
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 (2017)
Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: Pvnet: pixel-wise voting network for 6dof pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4561–4570 (2019)
Google Scholar
Zakharov, S., Shugurov, I., Ilic, S.: Dpod: dense 6d pose object detector in RGB images. arXiv preprint arXiv:1902.11020 (2019)
Wang, C., et al.: DenseFusion: 6D object pose estimation by iterative dense fusion. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019, pp. 3338–3347 (2019)
Google Scholar
He, Y., Sun, W., Huang, H., Liu, J., Fan, H., Sun, J.: PVN3D: a deep point-wise 3D keypoints voting network for 6DoF pose estimation. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020, pp. 11629–11638 (2020)
Google Scholar
Castro, P., Kim, T.K.: CRT-6D: fast 6D object pose estimation with cascaded refinement transformers. In: 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 5735–5744 (2022)
Google Scholar
Hu, Y., Hugonot, J., Fua, P., Salzmann, M.: Segmentation-driven 6D object pose estimation. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019, pp. 3380–3389 (2019)
Google Scholar
Chen, W., Jia, X., Chang, H.J., Duan, J., Leonardis, A.: G2L-net: global to local network for real time 6D pose estimation with embedding vector features. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020, pp. 4232–4241 (2020)
Google Scholar
He, Y., Huang, H., Fan, H., Chen, Q., Sun, J.: FFB6D: a full flow bidirectional fusion network for 6D pose estimation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021, pp. 3002–3012 (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
Google Scholar
Hu, Q., et al. Randla-net: efficient semantic segmentation of large-scale point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11108–11117 (2020)
Google Scholar
Li, Z., Wang, G., Ji, X.: CDPN: coordinates-based disentangled pose network for real-time rgb-based 6-dof object pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7678–7687 (2019)
Google Scholar
Xu, D., Anguelov, D., Jain, A.: Pointfusion: deep sensor fusion for 3d bounding box estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 244–253 (2018)
Google Scholar
Zhou, G., Wang, H., Chen, J., Huang, D.: PR-GCN: a deep graph convolutional network with point refinement for 6D pose estimation. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, pp. 2773–2782 (2021). https://doi.org/10.1109/ICCV48922.2021.00279
Huang, J., Xia, C., Liu, H., Liang, B.: PAV-Net: point-wise attention keypoints voting network for real-time 6D object pose estimation. In: 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, pp. 1–8 (2022). https://doi.org/10.1109/IJCNN55064.2022.9892089
Park, K., Patten, T., Vincze, M.: Pix2Pose: pixel-wise coordinate regression of objects for 6D pose estimation. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp. 7667–7676 (2019). https://doi.org/10.1109/ICCV.2019.00776
Song, C., Song, J., Huang, Q.: Hybridpose: 6d object pose estimation under hybrid representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 431–440 (2020)
Google Scholar
Hu, Y., Fua, P., Wang, W., Salzmann, M.: Single-stage 6d object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2930–2939 (2020)
Google Scholar
Li, Y., Wang, G., Ji, X., et al.: Deepim: deep iterative matching for 6d pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 683–698 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Engineering, Capital Normal University, Beijing, 100048, China
Laichao Wang, Weiding Lu, Yong Guan, Zhenzhou Shao & Zhiping Shi
Beijing Key Laboratory of Light Industrial Robot and Safety Verification, Capital Normal University, Beijing, 100048, China
Laichao Wang, Weiding Lu, Yong Guan & Zhenzhou Shao
Industrial and Commercial Bank of China Limited Beijing Branch, Beijing, 100032, China
Yuan Tian

Authors

Laichao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Weiding Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Tian
View author publications
You can also search for this author in PubMed Google Scholar
Yong Guan
View author publications
You can also search for this author in PubMed Google Scholar
Zhenzhou Shao
View author publications
You can also search for this author in PubMed Google Scholar
Zhiping Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhenzhou Shao .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Biao Luo
Chinese Academy of Sciences, Beijing, China
Long Cheng
Zhejiang University, Hangzhou, China
Zheng-Guang Wu
Guangdong University of Technology, Guangzhou, China
Hongyi Li
UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, L., Lu, W., Tian, Y., Guan, Y., Shao, Z., Shi, Z. (2024). 6D Object Pose Estimation with Attention Aware Bi-gated Fusion. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14448. Springer, Singapore. https://doi.org/10.1007/978-981-99-8082-6_44

Download citation

DOI: https://doi.org/10.1007/978-981-99-8082-6_44
Published: 15 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8081-9
Online ISBN: 978-981-99-8082-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

6D Object Pose Estimation with Attention Aware Bi-gated Fusion