research-article

Weakly supervised 6D pose estimation for robotic grasping

Authors:

Zhanpeng Zhang,

Xiaogang WangAuthors Info & Claims

VRCAI '18: Proceedings of the 16th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry

Article No.: 26, Pages 1 - 8

https://doi.org/10.1145/3284398.3284408

Published: 02 December 2018 Publication History

Abstract

Learning based robotic grasping methods achieve substantial progress with the development of the deep neural networks. However, the requirement of large-scale training data in the real world limits the application scopes of these methods. Given the 3D models of the target objects, we propose a new learning-based grasping approach built on 6D object poses estimation from a monocular RGB image. We aim to leverage both a large-scale synthesized 6D object pose dataset and a small scale of the real-world weakly labeled dataset (e.g., mark the number of objects in the image), to reduce the system deployment difficulty. In particular, the deep network combines the 6D pose estimation task and an auxiliary task of weak labels to perform knowledge transfer between the synthesized and real-world data. We demonstrate the effectiveness of the method in a real robotic environment and show substantial improvements in the successful grasping rate (about 11.9% on average) to the proposed knowledge transfer scheme.

Supplementary Material

MOV File (a26-li.mov)

Download
8.61 MB

References

[1]

Liefeng Bo, Xiaofeng Ren, and Dieter Fox. 2013. Unsupervised feature learning for RGB-D based object recognition. In Experimental Robotics. Springer, 387--402.

[2]

Eric Brachmann, Alexander Krull, Frank Michel, Stefan Gumhold, Jamie Shotton, and Carsten Rother. 2014. Learning 6d object pose estimation using 3d object coordinates. In European conference on computer vision. Springer, 536--551.

[3]

Pedro F Felzenszwalb, Ross B Girshick, David McAllester, and Deva Ramanan. 2010. Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence 32, 9 (2010), 1627--1645.

Digital Library

[4]

Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, and Pieter Abbeel. 2016. Deep spatial autoencoders for visuomotor learning. In IEEE International Conference on Robotics and Automation. IEEE, 512--519.

[5]

Spyros Gidaris and Nikos Komodakis. 2015. Object detection via a multi-region and semantic segmentation-aware cnn model. In Proceedings of the IEEE International Conference on Computer Vision. 1134--1142.

Digital Library

[6]

Stefan Hinterstoisser, Cedric Cagniart, Slobodan Ilic, Peter Sturm, Nassir Navab, Pascal Fua, and Vincent Lepetit. 2012a. Gradient response maps for real-time detection of textureless objects. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 5 (2012), 876--888.

Digital Library

[7]

Stefan Hinterstoisser, Stefan Holzer, Cedric Cagniart, Slobodan Ilic, Kurt Konolige, Nassir Navab, and Vincent Lepetit. 2011. Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In IEEE International Conference on Computer Vision. IEEE, 858--865.

Digital Library

[8]

Stefan Hinterstoisser, Vincent Lepetit, Slobodan Ilic, Pascal Fua, and Nassir Navab. 2010. Dominant orientation templates for real-time detection oftexture-less objects. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2257--2264.

[9]

Stefan Hinterstoisser, Vincent Lepetit, Slobodan Ilic, Stefan Holzer, Gary Bradski, Kurt Konolige, and Nassir Navab. 2012b. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In Asian conference on computer vision. Springer, 548--562.

Digital Library

[10]

Tomáš Hodan, Pavel Haluza, Štepán Obdržálek, Jiri Matas, Manolis Lourakis, and Xenophon Zabulis. 2017. T-LESS: An RGB-D dataset for 6D pose estimation of texture-less objects. In 2017 IEEE Winter Conference on Applications of Computer Vision. IEEE, 880--888.

[11]

Tomáš Hodaň, Jiří Matas, and Štěpán Obdržálek. 2016. On evaluation of 6D object pose estimation. In European Conference on Computer Vision. Springer, 606--619.

[12]

Tomáš Hodaň, Xenophon Zabulis, Manolis Lourakis, Štěpán Obdržálek, and Jiří Matas. 2015. Detection and fine 3D pose estimation of texture-less objects in RGB-D images. In IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 4421--4428.

[13]

Wadim Kehl, Fabian Manhardt, Federico Tombari, Slobodan Ilic, and Nassir Navab. 2017. SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again. In Proceedings of the International Conference on Computer Vision, Venice, Italy. 22--29.

[14]

Wadim Kehl, Fausto Milletari, Federico Tombari, Slobodan Ilic, and Nassir Navab. 2016. Deep learning of local RGB-D patches for 3D object detection and 6D pose estimation. In European Conference on Computer Vision. Springer, 205--220.

[15]

Diederik P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. 2014. Semi-supervised learning with deep generative models. In Advances in Neural Information Processing Systems. 3581--3589.

Digital Library

[16]

Guosheng Lin, Chunhua Shen, Qinfeng Shi, Anton Van den Hengel, and David Suter. 2014. Fast supervised hashing with decision trees for high-dimensional data. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1971--1978.

Digital Library

[17]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. arXiv preprint arXiv:1708.02002 (2017).

[18]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21--37.

[19]

David G Lowe. 2001. Local feature view clustering for 3D object recognition. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1. IEEE, I--I.

[20]

David G Lowe. 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision 60, 2 (2004), 91--110.

Digital Library

[21]

Jeffrey Mahler, Jacky Liang, Sherdil Niyaz, Michael Laskey, Richard Doan, Xinyu Liu, Juan Aparicio Ojea, and Ken Goldberg. 2017. Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. arXiv preprint arXiv:1703.09312 (2017).

[22]

Georgios Pavlakos, Xiaowei Zhou, Aaron Chan, Konstantinos G Derpanis, and Kostas Daniilidis. 2017. 6-dof object pose from semantic keypoints. In IEEE International Conference on Robotics and Automation. IEEE, 2011--2018.

[23]

Lerrel Pinto and Abhinav Gupta. 2016. Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. In IEEE International Conference on Robotics and Automation. IEEE, 3406--3413.

[24]

Yunchen Pu, Zhe Gan, Ricardo Henao, Xin Yuan, Chunyuan Li, Andrew Stevens, and Lawrence Carin. 2016. Variational autoencoder for deep learning of images, labels and captions. In Advances in neural information processing systems. 2352--2360.

Digital Library

[25]

Mahdi Rad and Vincent Lepetit. 2017. BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth. In International Conference on Computer Vision.

[26]

Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. 2015. Semi-supervised learning with ladder networks. In Advances in Neural Information Processing Systems. 3546--3554.

Digital Library

[27]

Radu Bogdan Rusu, Nico Blodow, and Michael Beetz. 2009. Fast point feature histograms (FPFH) for 3D registration. In IEEE International Conference on Robotics and Automation. IEEE, 3212--3217.

Digital Library

[28]

Radu Bogdan Rusu, Gary Bradski, Romain Thibaux, and John Hsu. 2010. Fast 3d recognition and pose using the viewpoint feature histogram. In IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2155--2162.

[29]

Avi Singh, Larry Yang, and Sergey Levine. 2017. GPLAC: Generalizing Vision-Based Robotic Skills using Weakly Labeled Images. arXiv preprint arXiv:1708.02313 (2017).

[30]

Henning Tjaden, Ulrich Schwanecke, and Elmar Schömer. 2017. Real-Time Monocular Pose Estimation of 3D Objects using Temporally Consistent Local Color Histograms. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 124--132.

[31]

Federico Tombari, Alessandro Franchi, and Luigi Di Stefano. 2013. BOLD features to detect texture-less objects. In IEEE International Conference on Computer Vision. IEEE, 1265--1272.

Digital Library

[32]

Paul Wohlhart and Vincent Lepetit. 2015. Learning descriptors for object recognition and 3d pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3109--3118.

[33]

Andy Zeng, Shuran Song, Kuan-Ting Yu, Elliott Donlon, Francois R Hogan, Maria Bauza, Daolin Ma, Orion Taylor, Melody Liu, Eudald Romo, et al. 2017. Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching. arXiv preprint arXiv:1710.01330 (2017).

Cited By

Cao HDirnberger LBernardini DPiazza CCaccamo M(2023)6IMPOSE: bridging the reality gap in 6D pose estimation for robotic graspingFrontiers in Robotics and AI10.3389/frobt.2023.117649210Online publication date: 27-Sep-2023
https://doi.org/10.3389/frobt.2023.1176492
Tran VLin H(2021)Semantic Segmentation and 6DoF Pose Estimation using RGB-D Images and Deep Neural Networks2021 IEEE 30th International Symposium on Industrial Electronics (ISIE)10.1109/ISIE45552.2021.9576248(1-6)Online publication date: 20-Jun-2021
https://doi.org/10.1109/ISIE45552.2021.9576248

Index Terms

Weakly supervised 6D pose estimation for robotic grasping
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review
Abstract
This paper presents a comprehensive survey on vision-based robotic grasping. We conclude three key tasks during vision-based robotic grasping, which are object localization, object pose estimation and grasp estimation. In detail, the object ...
WeLSA: Learning to Predict 6D Pose from Weakly Labeled Data Using Shape Alignment
Computer Vision – ECCV 2022
Abstract
Object pose estimation is a crucial task in computer vision and augmented reality. One of its key challenges is the difficulty of annotation of real training data and the lack of textured CAD models. Therefore, pipelines which do not require CAD ...
Semi- and weakly-supervised human pose estimation
Highlights
- Human pose estimation is achieved by semi- and weakly-supervised learning.
- Semi-...
Graphical abstract

Display Omitted

Abstract
For human pose estimation in still images, this paper proposes three semi- and weakly-supervised learning schemes. While recent advances of convolutional neural networks improve human pose estimation using supervised training data, our ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

VRCAI '18: Proceedings of the 16th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry

December 2018

200 pages

ISBN:9781450360876

DOI:10.1145/3284398

Conference Chairs:
Koji Mikami
Tokyo University of Technology, Japan
,
Zhigeng Pan
Hangzhou Normal University, China
,
Matt Adcock
Australian National University, Australia
,
Daniel Thalmann
EPFL, Switzerland
,
Program Chairs:
Xubo Yang
Shanghai Jiao Tong University, China
,
Tomoki Itamiya
Aichi University of Technology, Japan
,
Enhua Wu
IOS/CAS & University of Macau, China

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 December 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

VRCAI '18

Sponsor:

SIGGRAPH

VRCAI '18: International Conference on Virtual Reality Continuum and its Applications in Industry

December 2 - 3, 2018

Tokyo, Japan

Acceptance Rates

Overall Acceptance Rate 51 of 107 submissions, 48%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
305
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cao HDirnberger LBernardini DPiazza CCaccamo M(2023)6IMPOSE: bridging the reality gap in 6D pose estimation for robotic graspingFrontiers in Robotics and AI10.3389/frobt.2023.117649210Online publication date: 27-Sep-2023
https://doi.org/10.3389/frobt.2023.1176492
Tran VLin H(2021)Semantic Segmentation and 6DoF Pose Estimation using RGB-D Images and Deep Neural Networks2021 IEEE 30th International Symposium on Industrial Electronics (ISIE)10.1109/ISIE45552.2021.9576248(1-6)Online publication date: 20-Jun-2021
https://doi.org/10.1109/ISIE45552.2021.9576248

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten