Abstract
Accurate object pose estimation is a prerequisite for successful robotic grasping tasks. Currently keypoint-based pose estimation methods using RGB-D data have shown promising results in simple environments. However, how to fuse the complementary features from RGB-D data is still a challenging task. To this end, this paper proposes a two-branch network with attention aware bi-gated fusion (A2BF) module for the keypoint-based 6D object pose estimation, named A2BNet for abbreviation. A2BF module consists of two key components, bidirectional gated fusion and attention mechanism modules to effectively extract information from both RGB and point cloud data, prioritizing crucial details while disregarding irrelevant information. Several A2BF modules can be embedded in the network to generate complementary texture and geometric information. Extensive experiments are conducted on the public LineMOD and Occlusion LineMOD datasets. Experimental results demonstrate that the average accuracy using the proposed method on both datasets can reach 99.8% and 67.6% respectively, outperforms the state-of-the-art methods.
Supported by the Natural Science Foundation of China (62272322, 62002246, 62272323) and the Project of Beijing Municipal Education Commission (KM202010028010) and Applied Basic Research Project of Liaoning Province(2022JH2/101300279).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhang, H., Tang, J., Sun, S., et al.: Robotic grasping from classical to modern: a survey. arXiv preprint arXiv:2202.03631 (2022)
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 (2017)
Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: Pvnet: pixel-wise voting network for 6dof pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4561â4570 (2019)
Zakharov, S., Shugurov, I., Ilic, S.: Dpod: dense 6d pose object detector in RGB images. arXiv preprint arXiv:1902.11020 (2019)
Wang, C., et al.: DenseFusion: 6D object pose estimation by iterative dense fusion. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15â20 June 2019, pp. 3338â3347 (2019)
He, Y., Sun, W., Huang, H., Liu, J., Fan, H., Sun, J.: PVN3D: a deep point-wise 3D keypoints voting network for 6DoF pose estimation. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13â19 June 2020, pp. 11629â11638 (2020)
Castro, P., Kim, T.K.: CRT-6D: fast 6D object pose estimation with cascaded refinement transformers. In: 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 5735â5744 (2022)
Hu, Y., Hugonot, J., Fua, P., Salzmann, M.: Segmentation-driven 6D object pose estimation. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15â20 June 2019, pp. 3380â3389 (2019)
Chen, W., Jia, X., Chang, H.J., Duan, J., Leonardis, A.: G2L-net: global to local network for real time 6D pose estimation with embedding vector features. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13â19 June 2020, pp. 4232â4241 (2020)
He, Y., Huang, H., Fan, H., Chen, Q., Sun, J.: FFB6D: a full flow bidirectional fusion network for 6D pose estimation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Virtual, 19â25 June 2021, pp. 3002â3012 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770â778 (2016)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881â2890 (2017)
Hu, Q., et al. Randla-net: efficient semantic segmentation of large-scale point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11108â11117 (2020)
Li, Z., Wang, G., Ji, X.: CDPN: coordinates-based disentangled pose network for real-time rgb-based 6-dof object pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7678â7687 (2019)
Xu, D., Anguelov, D., Jain, A.: Pointfusion: deep sensor fusion for 3d bounding box estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 244â253 (2018)
Zhou, G., Wang, H., Chen, J., Huang, D.: PR-GCN: a deep graph convolutional network with point refinement for 6D pose estimation. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, pp. 2773â2782 (2021). https://doi.org/10.1109/ICCV48922.2021.00279
Huang, J., Xia, C., Liu, H., Liang, B.: PAV-Net: point-wise attention keypoints voting network for real-time 6D object pose estimation. In: 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, pp. 1â8 (2022). https://doi.org/10.1109/IJCNN55064.2022.9892089
Park, K., Patten, T., Vincze, M.: Pix2Pose: pixel-wise coordinate regression of objects for 6D pose estimation. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp. 7667â7676 (2019). https://doi.org/10.1109/ICCV.2019.00776
Song, C., Song, J., Huang, Q.: Hybridpose: 6d object pose estimation under hybrid representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 431â440 (2020)
Hu, Y., Fua, P., Wang, W., Salzmann, M.: Single-stage 6d object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2930â2939 (2020)
Li, Y., Wang, G., Ji, X., et al.: Deepim: deep iterative matching for 6d pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 683â698 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Âİ 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, L., Lu, W., Tian, Y., Guan, Y., Shao, Z., Shi, Z. (2024). 6D Object Pose Estimation with Attention Aware Bi-gated Fusion. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14448. Springer, Singapore. https://doi.org/10.1007/978-981-99-8082-6_44
Download citation
DOI: https://doi.org/10.1007/978-981-99-8082-6_44
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8081-9
Online ISBN: 978-981-99-8082-6
eBook Packages: Computer ScienceComputer Science (R0)