skip to main content
10.1145/3284398.3284408acmconferencesArticle/Chapter ViewAbstractPublication PagessiggraphConference Proceedingsconference-collections
research-article

Weakly supervised 6D pose estimation for robotic grasping

Published: 02 December 2018 Publication History

Abstract

Learning based robotic grasping methods achieve substantial progress with the development of the deep neural networks. However, the requirement of large-scale training data in the real world limits the application scopes of these methods. Given the 3D models of the target objects, we propose a new learning-based grasping approach built on 6D object poses estimation from a monocular RGB image. We aim to leverage both a large-scale synthesized 6D object pose dataset and a small scale of the real-world weakly labeled dataset (e.g., mark the number of objects in the image), to reduce the system deployment difficulty. In particular, the deep network combines the 6D pose estimation task and an auxiliary task of weak labels to perform knowledge transfer between the synthesized and real-world data. We demonstrate the effectiveness of the method in a real robotic environment and show substantial improvements in the successful grasping rate (about 11.9% on average) to the proposed knowledge transfer scheme.

Supplementary Material

MOV File (a26-li.mov)

References

[1]
Liefeng Bo, Xiaofeng Ren, and Dieter Fox. 2013. Unsupervised feature learning for RGB-D based object recognition. In Experimental Robotics. Springer, 387--402.
[2]
Eric Brachmann, Alexander Krull, Frank Michel, Stefan Gumhold, Jamie Shotton, and Carsten Rother. 2014. Learning 6d object pose estimation using 3d object coordinates. In European conference on computer vision. Springer, 536--551.
[3]
Pedro F Felzenszwalb, Ross B Girshick, David McAllester, and Deva Ramanan. 2010. Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence 32, 9 (2010), 1627--1645.
[4]
Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, and Pieter Abbeel. 2016. Deep spatial autoencoders for visuomotor learning. In IEEE International Conference on Robotics and Automation. IEEE, 512--519.
[5]
Spyros Gidaris and Nikos Komodakis. 2015. Object detection via a multi-region and semantic segmentation-aware cnn model. In Proceedings of the IEEE International Conference on Computer Vision. 1134--1142.
[6]
Stefan Hinterstoisser, Cedric Cagniart, Slobodan Ilic, Peter Sturm, Nassir Navab, Pascal Fua, and Vincent Lepetit. 2012a. Gradient response maps for real-time detection of textureless objects. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 5 (2012), 876--888.
[7]
Stefan Hinterstoisser, Stefan Holzer, Cedric Cagniart, Slobodan Ilic, Kurt Konolige, Nassir Navab, and Vincent Lepetit. 2011. Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In IEEE International Conference on Computer Vision. IEEE, 858--865.
[8]
Stefan Hinterstoisser, Vincent Lepetit, Slobodan Ilic, Pascal Fua, and Nassir Navab. 2010. Dominant orientation templates for real-time detection oftexture-less objects. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2257--2264.
[9]
Stefan Hinterstoisser, Vincent Lepetit, Slobodan Ilic, Stefan Holzer, Gary Bradski, Kurt Konolige, and Nassir Navab. 2012b. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In Asian conference on computer vision. Springer, 548--562.
[10]
Tomáš Hodan, Pavel Haluza, Štepán Obdržálek, Jiri Matas, Manolis Lourakis, and Xenophon Zabulis. 2017. T-LESS: An RGB-D dataset for 6D pose estimation of texture-less objects. In 2017 IEEE Winter Conference on Applications of Computer Vision. IEEE, 880--888.
[11]
Tomáš Hodaň, Jiří Matas, and Štěpán Obdržálek. 2016. On evaluation of 6D object pose estimation. In European Conference on Computer Vision. Springer, 606--619.
[12]
Tomáš Hodaň, Xenophon Zabulis, Manolis Lourakis, Štěpán Obdržálek, and Jiří Matas. 2015. Detection and fine 3D pose estimation of texture-less objects in RGB-D images. In IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 4421--4428.
[13]
Wadim Kehl, Fabian Manhardt, Federico Tombari, Slobodan Ilic, and Nassir Navab. 2017. SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again. In Proceedings of the International Conference on Computer Vision, Venice, Italy. 22--29.
[14]
Wadim Kehl, Fausto Milletari, Federico Tombari, Slobodan Ilic, and Nassir Navab. 2016. Deep learning of local RGB-D patches for 3D object detection and 6D pose estimation. In European Conference on Computer Vision. Springer, 205--220.
[15]
Diederik P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. 2014. Semi-supervised learning with deep generative models. In Advances in Neural Information Processing Systems. 3581--3589.
[16]
Guosheng Lin, Chunhua Shen, Qinfeng Shi, Anton Van den Hengel, and David Suter. 2014. Fast supervised hashing with decision trees for high-dimensional data. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1971--1978.
[17]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. arXiv preprint arXiv:1708.02002 (2017).
[18]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21--37.
[19]
David G Lowe. 2001. Local feature view clustering for 3D object recognition. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1. IEEE, I--I.
[20]
David G Lowe. 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision 60, 2 (2004), 91--110.
[21]
Jeffrey Mahler, Jacky Liang, Sherdil Niyaz, Michael Laskey, Richard Doan, Xinyu Liu, Juan Aparicio Ojea, and Ken Goldberg. 2017. Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. arXiv preprint arXiv:1703.09312 (2017).
[22]
Georgios Pavlakos, Xiaowei Zhou, Aaron Chan, Konstantinos G Derpanis, and Kostas Daniilidis. 2017. 6-dof object pose from semantic keypoints. In IEEE International Conference on Robotics and Automation. IEEE, 2011--2018.
[23]
Lerrel Pinto and Abhinav Gupta. 2016. Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. In IEEE International Conference on Robotics and Automation. IEEE, 3406--3413.
[24]
Yunchen Pu, Zhe Gan, Ricardo Henao, Xin Yuan, Chunyuan Li, Andrew Stevens, and Lawrence Carin. 2016. Variational autoencoder for deep learning of images, labels and captions. In Advances in neural information processing systems. 2352--2360.
[25]
Mahdi Rad and Vincent Lepetit. 2017. BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth. In International Conference on Computer Vision.
[26]
Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. 2015. Semi-supervised learning with ladder networks. In Advances in Neural Information Processing Systems. 3546--3554.
[27]
Radu Bogdan Rusu, Nico Blodow, and Michael Beetz. 2009. Fast point feature histograms (FPFH) for 3D registration. In IEEE International Conference on Robotics and Automation. IEEE, 3212--3217.
[28]
Radu Bogdan Rusu, Gary Bradski, Romain Thibaux, and John Hsu. 2010. Fast 3d recognition and pose using the viewpoint feature histogram. In IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2155--2162.
[29]
Avi Singh, Larry Yang, and Sergey Levine. 2017. GPLAC: Generalizing Vision-Based Robotic Skills using Weakly Labeled Images. arXiv preprint arXiv:1708.02313 (2017).
[30]
Henning Tjaden, Ulrich Schwanecke, and Elmar Schömer. 2017. Real-Time Monocular Pose Estimation of 3D Objects using Temporally Consistent Local Color Histograms. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 124--132.
[31]
Federico Tombari, Alessandro Franchi, and Luigi Di Stefano. 2013. BOLD features to detect texture-less objects. In IEEE International Conference on Computer Vision. IEEE, 1265--1272.
[32]
Paul Wohlhart and Vincent Lepetit. 2015. Learning descriptors for object recognition and 3d pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3109--3118.
[33]
Andy Zeng, Shuran Song, Kuan-Ting Yu, Elliott Donlon, Francois R Hogan, Maria Bauza, Daolin Ma, Orion Taylor, Melody Liu, Eudald Romo, et al. 2017. Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching. arXiv preprint arXiv:1710.01330 (2017).

Cited By

View all
  • (2023)6IMPOSE: bridging the reality gap in 6D pose estimation for robotic graspingFrontiers in Robotics and AI10.3389/frobt.2023.117649210Online publication date: 27-Sep-2023
  • (2021)Semantic Segmentation and 6DoF Pose Estimation using RGB-D Images and Deep Neural Networks2021 IEEE 30th International Symposium on Industrial Electronics (ISIE)10.1109/ISIE45552.2021.9576248(1-6)Online publication date: 20-Jun-2021

Index Terms

  1. Weakly supervised 6D pose estimation for robotic grasping

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    VRCAI '18: Proceedings of the 16th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry
    December 2018
    200 pages
    ISBN:9781450360876
    DOI:10.1145/3284398
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 December 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. pose estimation
    2. robotic grasping
    3. weak supervision

    Qualifiers

    • Research-article

    Conference

    VRCAI '18
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 51 of 107 submissions, 48%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)6IMPOSE: bridging the reality gap in 6D pose estimation for robotic graspingFrontiers in Robotics and AI10.3389/frobt.2023.117649210Online publication date: 27-Sep-2023
    • (2021)Semantic Segmentation and 6DoF Pose Estimation using RGB-D Images and Deep Neural Networks2021 IEEE 30th International Symposium on Industrial Electronics (ISIE)10.1109/ISIE45552.2021.9576248(1-6)Online publication date: 20-Jun-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media