Estimation of 6D Pose of Objects Based on a Variant Adversarial Autoencoder

Huang, Dan; Ahn, Hyemin; Li, Shile; Hu, Yueming; Lee, Dongheui

doi:10.1007/s11063-023-11215-2

Estimation of 6D Pose of Objects Based on a Variant Adversarial Autoencoder

Published: 14 March 2023

Volume 55, pages 9581–9596, (2023)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Dan Huang¹,
Hyemin Ahn²,
Shile Li²,
Yueming Hu³ &
…
Dongheui Lee²

247 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

The goal of this paper is to estimate object’s 6D pose based on the texture-less dataset. The pose of each projection view is obtained by rendering the 3D model of each object, and then the orientation feature of the object is implicitly represented by the latent space obtained from the RGB image. The 3D rotation of the object is estimated by establishing the codebook based on a template matching architecture. To build the latent space from the RGB images, this paper proposes a network based on a variant Adversarial Autoencoder (Makhzani et al. in Computer Science, 2015). To train the network, we use the dataset without pose annotation, and the encoder and decoder do not have a structural symmetry. The encoder is inspired by the existing model (Yang et al. in proceedings of IJCAI, 2018), (Yang et al. in proceedings 11 of CVPR, 2019) that incorporates the function of feature extraction from two different streams. Based on this network, the latent feature vector that implicitly represents the orientation of the object is obtained from the RGB image. Experimental results show that the method in this paper can realize the 6D pose estimation of the object and the result accuracy is better than the advanced method (Sundermeyer et al. in proceedings of ECCV, 2018).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Augmented Autoencoders: Implicit 3D Orientation Learning for 6D Object Detection

Article 23 October 2019

Autoencoder and Masked Image Encoding-Based Attentional Pose Network

Implicit 3D Orientation Learning for 6D Object Detection from RGB Images

References

Makhzani A, Shlens J, Jaitly N, Goodfellow I (2015) Adversarial autoencoders, Computer ence
Yang T-Y , Huang Y-H, Lin Y-Y,Hsiu P-C, Chuang Y-Y (2018) Ssr-net: a compact soft stagewise regression network for age estimation. In: proceedings of IJCAI, vol 5, no 6, p 7
Yang T-Y, Chen Y-T, Lin Y-Y, Chuang Y-Y (2019) Fsa-net: learning fine-grained structure aggregation for head pose estimation from a single image, In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 1087–1096
Sundermeyer M, Marton Z-C, Durner M, Brucker M, Triebel R (2018) Implicit 3d orientation learning for 6d object detection from rgb images, In: proceedings of the European conference on computer vision (ECCV) , pp 699–715
Makhataeva Z, Varol HA (2020) Augmented reality for robotics: a review. Robotics 9(2):21
Article Google Scholar
Orbik J, Agostini A, Lee D (2021) Inverse reinforcement learning for dexterous hand manipulation, In: 2021 IEEE international conference on development and learning (ICDL). IEEE, pp 1–7
He Y, Sun W, Huang H, Liu J, Fan H, Sun J (2020) Pvn3d: a deep point-wise 3d keypoints voting network for 6dof pose estimation, In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11632–11641
Wang C, Martín-Martín R, Xu D, Lv J, Lu C, Fei-Fei L, Savarese S, Zhu Y (2020) 6-Pack: category-level 6d pose tracker with anchor-based keypoints, In: 2020 IEEE international conference on robotics and automation (ICRA).IEEE, pp 10059–10066
Gonzalez M, Kacete A, Murienne A, Marchand E (2020) Yoloff: you only learn offsets for robust 6dof object pose estimation,” http://arxiv.org/abs/2002.00911
Yu X, Schmidt T, Narayanan V, Fox D (2018) Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes, In: robotics:science and systems 2018
Tejani A, Tang D, Kouskouridas R, Kim TK (2014) Latent-class hough forests for 3d object detection and pose estimation, In: European conference on computer vision
Kehl W, Milletari F, Tombari F, Ilic S, Navab N (2016) Deep learning of local rgb-d patches for 3d object detection and 6d pose estimation, In: European conference on computer vision
Drost B, Ulrich M, Navab N, Ilic S (2010) Model globally, match locally: Efficient and robust 3d object recognition,” In: computer vision & pattern recognition
Sun S, Liu R, Du Q, Sun S (2020) Selective embedding with gated fusion for 6d object pose estimation. Neural Process Lett 51:2417–2436
Article Google Scholar
Hong C, Yu J, Zhang J, Jin X, Lee K-H (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inform 15(7):3952–3961
Article Google Scholar
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Article MathSciNet MATH Google Scholar
Zhang S, Wang T, Cao J, Liu J (2022) Multichannel matrix randomized autoencoder. Neural Process Lett. https://doi.org/10.1007/s11063-022-11134-8
Article Google Scholar
Li S, Koo S, Lee D (2015) Real-time and model-free object tracking using particle filter with joint color-spatial descriptor, In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 6079–6085
Hodan T, Haluza P, Obdržálek Š, Matas J, Lourakis M, Zabulis X (2017) T-less: an rgb-d dataset for 6d pose estimation of texture-less objects, In: 2017 IEEE winter conference on applications of computer vision (WACV).IEEE, pp 880–888
Du G, Wang K, Lian S, Zhao K (2020) Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review. Artif Intell Rev 54(3):1677–1734
Article Google Scholar
Zakharov S, Shugurov I, Ilic S (2019) Dpod: 6d pose object detector and refiner, In: proceedings of the IEEE international conference on computer vision, pp 1941–1950
Hu Y, Fua P, Wang W, Salzmann M (2020) Single-stage 6d object pose estimation, In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2930–2939
Tekin B, Sinha SN, Fua P (2018) Real-time seamless single shot 6d object pose prediction, In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 292–301
Hu Y, Hugonot J, Fua P, Salzmann M (2019) Segmentation-driven 6d object pose estimation, In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 3385–3394
Peng S, Liu Y, Huang Q, Zhou X, Bao H (2019) Pvnet: pixel-wise voting network for 6dof pose estimation, In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Wang C, Xu D, Zhu Y, Martín-Martín R, Lu C, Fei-Fei L, Savarese S (2019) Densefusion: 6d object pose estimation by iterative dense fusion, In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 3343–3352
Zhu A, Yang J, Zhao W, Cao Z (2020) Lrf-net: learning local reference frames for 3d local shape description and matching. Sensors 20(18):5086
Article Google Scholar
Yu X, Zhuang Z, Koniusz P, Li H (2020) 6dof object pose estimation via differentiable proxy voting loss, http://arxiv.org/abs/2002.03923
Mellado N, Aiger D, Mitra NJ (2014) Super 4pcs fast global pointcloud registration via smart indexing, In: computer graphics forum, vol 33, no 5 Wiley Online Library, pp 205–215
Yang J, Li H, Campbell D, Jia Y (2015) Go-icp: a globally optimal solution to 3d icp point-set registration. IEEE Trans Pattern Anal Mach Intell 38(11):2241–2254
Article Google Scholar
Gao G, Lauri M, Wang Y, Hu X, Zhang J, Frintrop S (2020) 6d Object pose regression via supervised learning on point clouds, pp 3643–3649
Yu J, Tan M, Zhang H, Rui Y, Tao D (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578
Article Google Scholar
Kingma DP, Ba J, (2014) Adam: a method for stochastic optimization, http://arxiv.org/abs/1412.6980
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. J Mach Learn Res 9:249–256
Google Scholar
Hinterstoisser S, Benhimane S, Lepetit V, Fua P, Navab N (2008) Simultaneous recognition and homography extraction of local patches with a simple linear classifier. In: BMVC. Citeseer, pp 1–10
Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136
Article Google Scholar
Drost B, Ulrich M, Navab N, Ilic S, Model globally, match locally: Efficient and robust 3d object recognition, In: (2010) IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 998–1005
Choi C, Christensen HI (2012) 3d pose estimation of daily objects using an rgb-d camera, In: 2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, pp 3342–3349
Shotton J, Glocker B, Zach C, Izadi S, Criminisi A, Fitzgibbon A (2013) Scene coordinate regression forests for camera relocalization in rgb-d images, In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 2930–2937
Hodaň T, Matas J, Obdržálek Š (2016) On evaluation of 6d object pose estimation, In: European conference on computer vision. Springer, pp 606–619
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector, In: European conference on computer vision. Springer, pp 21–37

Download references

Acknowledgements

This work has been partially supported by Helmholtz Association and the Oversea Study Program of Guangzhou Elite Project.

Author information

Authors and Affiliations

School of Mechanical and Automotive Engineering, South China University of Technology, Guangzhou, 510640, China
Dan Huang
Department of Electrical and Computer Engineering, Technical University of Munich, 80333, Munich, Germany
Hyemin Ahn, Shile Li & Dongheui Lee
School of Automotive Science and Engineering, South China University of Technology, Guangzhou, 510640, China
Yueming Hu

Authors

Dan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Hyemin Ahn
View author publications
You can also search for this author in PubMed Google Scholar
Shile Li
View author publications
You can also search for this author in PubMed Google Scholar
Yueming Hu
View author publications
You can also search for this author in PubMed Google Scholar
Dongheui Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dan Huang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Huang, D., Ahn, H., Li, S. et al. Estimation of 6D Pose of Objects Based on a Variant Adversarial Autoencoder. Neural Process Lett 55, 9581–9596 (2023). https://doi.org/10.1007/s11063-023-11215-2

Download citation

Accepted: 26 February 2023
Published: 14 March 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11063-023-11215-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimation of 6D Pose of Objects Based on a Variant Adversarial Autoencoder

Abstract

Access this article

Similar content being viewed by others

Augmented Autoencoders: Implicit 3D Orientation Learning for 6D Object Detection

Autoencoder and Masked Image Encoding-Based Attentional Pose Network

Implicit 3D Orientation Learning for 6D Object Detection from RGB Images

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Estimation of 6D Pose of Objects Based on a Variant Adversarial Autoencoder

Abstract

Access this article

Similar content being viewed by others

Augmented Autoencoders: Implicit 3D Orientation Learning for 6D Object Detection

Autoencoder and Masked Image Encoding-Based Attentional Pose Network

Implicit 3D Orientation Learning for 6D Object Detection from RGB Images

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation