Skip to main content

Advertisement

DFW-PVNet: data field weighting based pixel-wise voting network for effective 6D pose estimation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

With the benefit of reduced memory and computational overhead, the sparse-based 6 degrees-of-freedom (6D) pose estimation method leverages the creation of sparse two-dimensional (2D) to three-dimensional (3D) correspondences to estimate the pose of objects in an RGB image. However, this method often leads to accuracy degradation. In this paper, we propose a data field weighting based pixel-wise voting network (DFW-PVNet), aiming at improving the accuracy of the 6D pose estimation while keeping excellent memory and computational overheads. The proposed DFW-PVNet first assigns potential weights to pixels at different positions by utilizing data field theory and then selects the pixels with higher potential weights to participate in the voting and locating of 2D keypoints. By building accurate sparse 2D-3D correspondences between the located 2D keypoints and the corresponding predefined 3D keypoints, the 6D pose of the object can be calculated through a perspective-n-point (PnP) solver. Experiments are conducted based on the LINEMOD and the Occlusion LINEMOD datasets, and the results show that the accuracy of the proposed method surpasses the state-of-the-art sparse-based methods and is comparable to dense-based methods but with significantly lower memory and computational overheads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Algorithm 2
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The datasets for this study can be found in the online repository [https://bop.felk.cvut.cz/datasets/].

References

  1. Fan Z, Zhu Y, He Y et al (2022) Deep learning on monocular object pose detection and tracking: A comprehensive overview. ACM Comput Surv 55(4):1–40

    Article  MATH  Google Scholar 

  2. Wang Y, Xie J, Cheng J, Dou L (2023) Review of object pose estimation in RGB images based on deep learning. J Comp Appl 43(8):2546

    MATH  Google Scholar 

  3. Gorschlüter F, Rojtberg P, Pöllabauer T (2022) A survey of 6d object detection based on 3d models for industrial applications. J Imaging 8(3):53

    Article  Google Scholar 

  4. Wang C, Xu D, Zhu Y, Martín-Martín R, Lu C, Fei-Fei L, Savarese S (2019) Densefusion: 6D object pose estimation by iterative dense fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3343–3352

  5. He Y, Sun W, Huang H, Liu J, Fan H, Sun J (2020) PVN3D: A deep point-wise 3D keypoints voting network for 6DoF pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11632–11641

  6. He Y, Wang Y, Fan H, Sun J, Chen Q (2022) FS6D: Few-shot 6D pose estimation of novel objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6814–6824

  7. Brachmann E, Michel F, Krull A, Yang MY, Gumhold S (2016) Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3364–3372

  8. Rad M, Lepetit V (2017) Bb8: A scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3828–3836

  9. Tekin B, Sinha SN, Fua P (2018) Real-time seamless single shot 6D object pose prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 292–301

  10. Zhou Y, Liu S (2022) Object pose estimation based on improved YOLOX algorithm. In: 2022 IEEE 11th Data Driven Control and Learning Systems Conference (DDCLS), pp 699–705. IEEE

  11. Zhang S, Zhao W, Guan Z, Peng X, Peng J (2021) Keypoint-graph-driven learning framework for object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1065–1073

  12. Oberweger M, Rad M, Lepetit V (2018) Making deep heatmaps robust to partial occlusions for 3D object pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 119–134

  13. Park K, Patten T, Vincze M (2019) Pix2pose: Pixel-wise coordinate regression of objects for 6D pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 7668–7677s

  14. Peng S, Liu Y, Huang Q, Zhou X, Bao H (2019) PVNet: Pixel-wise voting network for 6DoF object pose estimation. IEEE Trans Pattern Anal Mach Intell 14(8)

  15. Li Z, Wang G, Ji X (2019) CDPN: Coordinates-based disentangled pose network for real-time RGB-based 6-DoF object pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 7678–7687

  16. Wang G, Manhardt F, Tombari F, Ji X (2021) GDR-Net: Geometry-guided direct regression network for monocular 6D object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 16611–16621

  17. Li D, Du Y (2017) Artificial intelligence with uncertainty. CRC Press

    Book  MATH  Google Scholar 

  18. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  19. Xiao J, Hays J, Ehinger KA, Oliva A, Torralba A (2010) SUN database: Large-scale scene recognition from abbey to zoo. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 3485–3492. IEEE

  20. Hinterstoisser S, Cagniart C, Ilic S, Sturm P, Navab N, Fua P, Lepetit V (2011) Gradient response maps for real-time detection of textureless objects. IEEE Trans Pattern Anal Mach Intell 34(5):876–888

    Article  Google Scholar 

  21. Hu Y, Hugonot J, Fua P, Salzmann M (2019) Segmentation-driven 6D object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3385–3394

  22. Lepetit V, Moreno-Noguer F, Fua P (2009) EPnP: An accurate O(n) solution to the PnP problem. Int J Comput Vis 81:155–166

    Article  MATH  Google Scholar 

  23. Zakharov S, Shugurov I, Ilic S (2019) DPOD: 6D pose object detector and refiner. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1941–1950

  24. Hodan T, Barath D, Matas J (2020) EPOS: Estimating 6D pose of objects with symmetries. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11703–11712

  25. Redmon J, Farhadi A (2017) YOLO9000: Better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7263–7271

  26. Ge Z, Liu S, Wang F, Li Z, Sun J (2021) YOLOX: Exceeding YOLO series in 2021. arXiv preprint arXiv:2107.08430

  27. Sener O, Koltun V (2018) Multi-task learning as multi-objective optimization. Advances in Neural Information Processing Systems 31

  28. Guo, S., Hu, Y., Alvarez, J. M., & Salzmann, M. (2023). Knowledge distillation for 6d pose estimation by aligning distributions of local predictions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 18633–18642

  29. Xu, Y., Lin, K. Y., Zhang, G., Wang, X., & Li, H. (2024). Rnnpose: 6-dof object pose estimation via recurrent correspondence field estimation and pose optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence

  30. Li, F., Vutukur, S. R., Yu, H., Shugurov, I., Busam, B., Yang, S., & Ilic, S. (2023). Nerf-pose: A first-reconstruct-then-regress approach for weakly-supervised 6d object pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 2123–2133

  31. Liu P, Zhang Q, Cheng J (2024) Bdr6d: Bidirectional deep residual fusion network for 6d pose estimation. IEEE Trans Autom Sci Eng 21(2):1793–1804

    Article  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (NSFC) under Grant No.61772061.

Author information

Authors and Affiliations

Authors

Contributions

Yinning Lu proposed the initial research idea, conducted the experiments, and wrote initial manuscript draft. Songwei Pei supervised the work and revised the manuscript.

Corresponding author

Correspondence to Songwei Pei.

Ethics declarations

Ethics approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Competing interests

The authors declare that there is no conflict of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, Y., Pei, S. DFW-PVNet: data field weighting based pixel-wise voting network for effective 6D pose estimation. Appl Intell 55, 240 (2025). https://doi.org/10.1007/s10489-024-05942-9

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05942-9

Keywords