Skip to main content
Log in

Estimation of 6D Pose of Objects Based on a Variant Adversarial Autoencoder

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

The goal of this paper is to estimate object’s 6D pose based on the texture-less dataset. The pose of each projection view is obtained by rendering the 3D model of each object, and then the orientation feature of the object is implicitly represented by the latent space obtained from the RGB image. The 3D rotation of the object is estimated by establishing the codebook based on a template matching architecture. To build the latent space from the RGB images, this paper proposes a network based on a variant Adversarial Autoencoder (Makhzani et al. in Computer Science, 2015). To train the network, we use the dataset without pose annotation, and the encoder and decoder do not have a structural symmetry. The encoder is inspired by the existing model (Yang et al. in proceedings of IJCAI, 2018), (Yang et al. in proceedings 11 of CVPR, 2019) that incorporates the function of feature extraction from two different streams. Based on this network, the latent feature vector that implicitly represents the orientation of the object is obtained from the RGB image. Experimental results show that the method in this paper can realize the 6D pose estimation of the object and the result accuracy is better than the advanced method (Sundermeyer et al. in proceedings of ECCV, 2018).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Makhzani A, Shlens J, Jaitly N, Goodfellow I (2015) Adversarial autoencoders, Computer ence

  2. Yang T-Y , Huang Y-H, Lin Y-Y,Hsiu P-C, Chuang Y-Y (2018) Ssr-net: a compact soft stagewise regression network for age estimation. In: proceedings of IJCAI, vol 5, no 6, p 7

  3. Yang T-Y, Chen Y-T, Lin Y-Y, Chuang Y-Y (2019) Fsa-net: learning fine-grained structure aggregation for head pose estimation from a single image, In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 1087–1096

  4. Sundermeyer M, Marton Z-C, Durner M, Brucker M, Triebel R (2018) Implicit 3d orientation learning for 6d object detection from rgb images, In: proceedings of the European conference on computer vision (ECCV) , pp 699–715

  5. Makhataeva Z, Varol HA (2020) Augmented reality for robotics: a review. Robotics 9(2):21

    Article  Google Scholar 

  6. Orbik J, Agostini A, Lee D (2021) Inverse reinforcement learning for dexterous hand manipulation, In: 2021 IEEE international conference on development and learning (ICDL). IEEE, pp 1–7

  7. He Y, Sun W, Huang H, Liu J, Fan H, Sun J (2020) Pvn3d: a deep point-wise 3d keypoints voting network for 6dof pose estimation, In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11632–11641

  8. Wang C, Martín-Martín R, Xu D, Lv J, Lu C, Fei-Fei L, Savarese S, Zhu Y (2020) 6-Pack: category-level 6d pose tracker with anchor-based keypoints, In: 2020 IEEE international conference on robotics and automation (ICRA).IEEE, pp 10059–10066

  9. Gonzalez M, Kacete A, Murienne A, Marchand E (2020) Yoloff: you only learn offsets for robust 6dof object pose estimation,” http://arxiv.org/abs/2002.00911

  10. Yu X, Schmidt T, Narayanan V, Fox D (2018) Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes, In: robotics:science and systems 2018

  11. Tejani A, Tang D, Kouskouridas R, Kim TK (2014) Latent-class hough forests for 3d object detection and pose estimation, In: European conference on computer vision

  12. Kehl W, Milletari F, Tombari F, Ilic S, Navab N (2016) Deep learning of local rgb-d patches for 3d object detection and 6d pose estimation, In: European conference on computer vision

  13. Drost B, Ulrich M, Navab N, Ilic S (2010) Model globally, match locally: Efficient and robust 3d object recognition,” In: computer vision & pattern recognition

  14. Sun S, Liu R, Du Q, Sun S (2020) Selective embedding with gated fusion for 6d object pose estimation. Neural Process Lett 51:2417–2436

    Article  Google Scholar 

  15. Hong C, Yu J, Zhang J, Jin X, Lee K-H (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inform 15(7):3952–3961

    Article  Google Scholar 

  16. Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670

    Article  MathSciNet  MATH  Google Scholar 

  17. Zhang S, Wang T, Cao J, Liu J (2022) Multichannel matrix randomized autoencoder. Neural Process Lett. https://doi.org/10.1007/s11063-022-11134-8

    Article  Google Scholar 

  18. Li S, Koo S, Lee D (2015) Real-time and model-free object tracking using particle filter with joint color-spatial descriptor, In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 6079–6085

  19. Hodan T, Haluza P, Obdržálek Š, Matas J, Lourakis M, Zabulis X (2017) T-less: an rgb-d dataset for 6d pose estimation of texture-less objects, In: 2017 IEEE winter conference on applications of computer vision (WACV).IEEE, pp 880–888

  20. Du G, Wang K, Lian S, Zhao K (2020) Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review. Artif Intell Rev 54(3):1677–1734

    Article  Google Scholar 

  21. Zakharov S, Shugurov I, Ilic S (2019) Dpod: 6d pose object detector and refiner, In: proceedings of the IEEE international conference on computer vision, pp 1941–1950

  22. Hu Y, Fua P, Wang W, Salzmann M (2020) Single-stage 6d object pose estimation, In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2930–2939

  23. Tekin B, Sinha SN, Fua P (2018) Real-time seamless single shot 6d object pose prediction, In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 292–301

  24. Hu Y, Hugonot J, Fua P, Salzmann M (2019) Segmentation-driven 6d object pose estimation, In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 3385–3394

  25. Peng S, Liu Y, Huang Q, Zhou X, Bao H (2019) Pvnet: pixel-wise voting network for 6dof pose estimation, In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  26. Wang C, Xu D, Zhu Y, Martín-Martín R, Lu C, Fei-Fei L, Savarese S (2019) Densefusion: 6d object pose estimation by iterative dense fusion, In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 3343–3352

  27. Zhu A, Yang J, Zhao W, Cao Z (2020) Lrf-net: learning local reference frames for 3d local shape description and matching. Sensors 20(18):5086

    Article  Google Scholar 

  28. Yu X, Zhuang Z, Koniusz P, Li H (2020) 6dof object pose estimation via differentiable proxy voting loss, http://arxiv.org/abs/2002.03923

  29. Mellado N, Aiger D, Mitra NJ (2014) Super 4pcs fast global pointcloud registration via smart indexing, In: computer graphics forum, vol 33, no 5 Wiley Online Library, pp 205–215

  30. Yang J, Li H, Campbell D, Jia Y (2015) Go-icp: a globally optimal solution to 3d icp point-set registration. IEEE Trans Pattern Anal Mach Intell 38(11):2241–2254

    Article  Google Scholar 

  31. Gao G, Lauri M, Wang Y, Hu X, Zhang J, Frintrop S (2020) 6d Object pose regression via supervised learning on point clouds, pp 3643–3649

  32. Yu J, Tan M, Zhang H, Rui Y, Tao D (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578

    Article  Google Scholar 

  33. Kingma DP, Ba J, (2014) Adam: a method for stochastic optimization, http://arxiv.org/abs/1412.6980

  34. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. J Mach Learn Res 9:249–256

    Google Scholar 

  35. Hinterstoisser S, Benhimane S, Lepetit V, Fua P, Navab N (2008) Simultaneous recognition and homography extraction of local patches with a simple linear classifier. In: BMVC. Citeseer, pp 1–10

  36. Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136

    Article  Google Scholar 

  37. Drost B, Ulrich M, Navab N, Ilic S, Model globally, match locally: Efficient and robust 3d object recognition, In: (2010) IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 998–1005

  38. Choi C, Christensen HI (2012) 3d pose estimation of daily objects using an rgb-d camera, In: 2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, pp 3342–3349

  39. Shotton J, Glocker B, Zach C, Izadi S, Criminisi A, Fitzgibbon A (2013) Scene coordinate regression forests for camera relocalization in rgb-d images, In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 2930–2937

  40. Hodaň T, Matas J, Obdržálek Š (2016) On evaluation of 6d object pose estimation, In: European conference on computer vision. Springer, pp 606–619

  41. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector, In: European conference on computer vision. Springer, pp 21–37

Download references

Acknowledgements

This work has been partially supported by Helmholtz Association and the Oversea Study Program of Guangzhou Elite Project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dan Huang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, D., Ahn, H., Li, S. et al. Estimation of 6D Pose of Objects Based on a Variant Adversarial Autoencoder. Neural Process Lett 55, 9581–9596 (2023). https://doi.org/10.1007/s11063-023-11215-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-023-11215-2

Keywords

Navigation