Abstract
The goal of this paper is to estimate object’s 6D pose based on the texture-less dataset. The pose of each projection view is obtained by rendering the 3D model of each object, and then the orientation feature of the object is implicitly represented by the latent space obtained from the RGB image. The 3D rotation of the object is estimated by establishing the codebook based on a template matching architecture. To build the latent space from the RGB images, this paper proposes a network based on a variant Adversarial Autoencoder (Makhzani et al. in Computer Science, 2015). To train the network, we use the dataset without pose annotation, and the encoder and decoder do not have a structural symmetry. The encoder is inspired by the existing model (Yang et al. in proceedings of IJCAI, 2018), (Yang et al. in proceedings 11 of CVPR, 2019) that incorporates the function of feature extraction from two different streams. Based on this network, the latent feature vector that implicitly represents the orientation of the object is obtained from the RGB image. Experimental results show that the method in this paper can realize the 6D pose estimation of the object and the result accuracy is better than the advanced method (Sundermeyer et al. in proceedings of ECCV, 2018).
Similar content being viewed by others
References
Makhzani A, Shlens J, Jaitly N, Goodfellow I (2015) Adversarial autoencoders, Computer ence
Yang T-Y , Huang Y-H, Lin Y-Y,Hsiu P-C, Chuang Y-Y (2018) Ssr-net: a compact soft stagewise regression network for age estimation. In: proceedings of IJCAI, vol 5, no 6, p 7
Yang T-Y, Chen Y-T, Lin Y-Y, Chuang Y-Y (2019) Fsa-net: learning fine-grained structure aggregation for head pose estimation from a single image, In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 1087–1096
Sundermeyer M, Marton Z-C, Durner M, Brucker M, Triebel R (2018) Implicit 3d orientation learning for 6d object detection from rgb images, In: proceedings of the European conference on computer vision (ECCV) , pp 699–715
Makhataeva Z, Varol HA (2020) Augmented reality for robotics: a review. Robotics 9(2):21
Orbik J, Agostini A, Lee D (2021) Inverse reinforcement learning for dexterous hand manipulation, In: 2021 IEEE international conference on development and learning (ICDL). IEEE, pp 1–7
He Y, Sun W, Huang H, Liu J, Fan H, Sun J (2020) Pvn3d: a deep point-wise 3d keypoints voting network for 6dof pose estimation, In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11632–11641
Wang C, Martín-Martín R, Xu D, Lv J, Lu C, Fei-Fei L, Savarese S, Zhu Y (2020) 6-Pack: category-level 6d pose tracker with anchor-based keypoints, In: 2020 IEEE international conference on robotics and automation (ICRA).IEEE, pp 10059–10066
Gonzalez M, Kacete A, Murienne A, Marchand E (2020) Yoloff: you only learn offsets for robust 6dof object pose estimation,” http://arxiv.org/abs/2002.00911
Yu X, Schmidt T, Narayanan V, Fox D (2018) Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes, In: robotics:science and systems 2018
Tejani A, Tang D, Kouskouridas R, Kim TK (2014) Latent-class hough forests for 3d object detection and pose estimation, In: European conference on computer vision
Kehl W, Milletari F, Tombari F, Ilic S, Navab N (2016) Deep learning of local rgb-d patches for 3d object detection and 6d pose estimation, In: European conference on computer vision
Drost B, Ulrich M, Navab N, Ilic S (2010) Model globally, match locally: Efficient and robust 3d object recognition,” In: computer vision & pattern recognition
Sun S, Liu R, Du Q, Sun S (2020) Selective embedding with gated fusion for 6d object pose estimation. Neural Process Lett 51:2417–2436
Hong C, Yu J, Zhang J, Jin X, Lee K-H (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inform 15(7):3952–3961
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Zhang S, Wang T, Cao J, Liu J (2022) Multichannel matrix randomized autoencoder. Neural Process Lett. https://doi.org/10.1007/s11063-022-11134-8
Li S, Koo S, Lee D (2015) Real-time and model-free object tracking using particle filter with joint color-spatial descriptor, In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 6079–6085
Hodan T, Haluza P, Obdržálek Š, Matas J, Lourakis M, Zabulis X (2017) T-less: an rgb-d dataset for 6d pose estimation of texture-less objects, In: 2017 IEEE winter conference on applications of computer vision (WACV).IEEE, pp 880–888
Du G, Wang K, Lian S, Zhao K (2020) Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review. Artif Intell Rev 54(3):1677–1734
Zakharov S, Shugurov I, Ilic S (2019) Dpod: 6d pose object detector and refiner, In: proceedings of the IEEE international conference on computer vision, pp 1941–1950
Hu Y, Fua P, Wang W, Salzmann M (2020) Single-stage 6d object pose estimation, In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2930–2939
Tekin B, Sinha SN, Fua P (2018) Real-time seamless single shot 6d object pose prediction, In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 292–301
Hu Y, Hugonot J, Fua P, Salzmann M (2019) Segmentation-driven 6d object pose estimation, In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 3385–3394
Peng S, Liu Y, Huang Q, Zhou X, Bao H (2019) Pvnet: pixel-wise voting network for 6dof pose estimation, In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Wang C, Xu D, Zhu Y, Martín-Martín R, Lu C, Fei-Fei L, Savarese S (2019) Densefusion: 6d object pose estimation by iterative dense fusion, In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 3343–3352
Zhu A, Yang J, Zhao W, Cao Z (2020) Lrf-net: learning local reference frames for 3d local shape description and matching. Sensors 20(18):5086
Yu X, Zhuang Z, Koniusz P, Li H (2020) 6dof object pose estimation via differentiable proxy voting loss, http://arxiv.org/abs/2002.03923
Mellado N, Aiger D, Mitra NJ (2014) Super 4pcs fast global pointcloud registration via smart indexing, In: computer graphics forum, vol 33, no 5 Wiley Online Library, pp 205–215
Yang J, Li H, Campbell D, Jia Y (2015) Go-icp: a globally optimal solution to 3d icp point-set registration. IEEE Trans Pattern Anal Mach Intell 38(11):2241–2254
Gao G, Lauri M, Wang Y, Hu X, Zhang J, Frintrop S (2020) 6d Object pose regression via supervised learning on point clouds, pp 3643–3649
Yu J, Tan M, Zhang H, Rui Y, Tao D (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578
Kingma DP, Ba J, (2014) Adam: a method for stochastic optimization, http://arxiv.org/abs/1412.6980
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. J Mach Learn Res 9:249–256
Hinterstoisser S, Benhimane S, Lepetit V, Fua P, Navab N (2008) Simultaneous recognition and homography extraction of local patches with a simple linear classifier. In: BMVC. Citeseer, pp 1–10
Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136
Drost B, Ulrich M, Navab N, Ilic S, Model globally, match locally: Efficient and robust 3d object recognition, In: (2010) IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 998–1005
Choi C, Christensen HI (2012) 3d pose estimation of daily objects using an rgb-d camera, In: 2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, pp 3342–3349
Shotton J, Glocker B, Zach C, Izadi S, Criminisi A, Fitzgibbon A (2013) Scene coordinate regression forests for camera relocalization in rgb-d images, In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 2930–2937
Hodaň T, Matas J, Obdržálek Š (2016) On evaluation of 6d object pose estimation, In: European conference on computer vision. Springer, pp 606–619
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector, In: European conference on computer vision. Springer, pp 21–37
Acknowledgements
This work has been partially supported by Helmholtz Association and the Oversea Study Program of Guangzhou Elite Project.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, D., Ahn, H., Li, S. et al. Estimation of 6D Pose of Objects Based on a Variant Adversarial Autoencoder. Neural Process Lett 55, 9581–9596 (2023). https://doi.org/10.1007/s11063-023-11215-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-023-11215-2