Learning Accurate Objectness Instance Segmentation from Photorealistic Rendering for Robotic Manipulation

Li, Siyi; Zhou, Jiaji; Jia, Zhenzhong; Yeung, Dit-Yan; Mason, Matthew T.

doi:10.1007/978-3-030-33950-0_22

Siyi Li¹³,
Jiaji Zhou¹⁴,
Zhenzhong Jia¹⁴,
Dit-Yan Yeung¹³ &
…
Matthew T. Mason¹⁴

Part of the book series: Springer Proceedings in Advanced Robotics ((SPAR,volume 11))

Included in the following conference series:

International Symposium on Experimental Robotics

2148 Accesses

Abstract

Recent progress in computer vision has been driven by high-capacity deep convolutional neural network (CNN) models trained on generic large datasets. However, creating large datasets with dense pixel-level labels is extremely costly. In this paper, we focus on the problem of instance segmentation for robotic manipulation using rich image and depth features. To avoid intensive human labeling, we develop an automated rendering pipeline for rapidly generating labeled datasets. Given 3D object models as input, the rendering pipeline produces photorealistic images with pixel-accurate semantic label maps and depth maps. The synthetic dataset is then used to train an RGB-D segmentation model by extending the Mask R-CNN framework for depth input fusion. Our results open up new possibilities for advancing robotic perception using cheap and large-scale synthetic data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Tool for Building Multi-purpose and Multi-pose Synthetic Data Sets

End-to-End 6-DoF Object Pose Estimation Through Differentiable Rasterization

Large-Scale Stochastic Scene Generation and Semantic Annotation for Deep Convolutional Neural Network Training in the RoboCup SPL

References

Zhou, J., Paolini, R., Johnson, A.M., Bagnell, J.A., Mason, M.T.: A probabilistic planning framework for planar grasping under uncertainty. IEEE Robot. Autom. Lett. 2(4), 2111–2118 (2017)
Article Google Scholar
Mitash, C., Bekris, K.E., Boularias, A.: A self-supervised learning system for object detection using physics simulation and multi-view pose estimation. In: IROS (2017)
Google Scholar
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: IROS (2017)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
Google Scholar
Zeng, A., Yu, K.T., Song, S., Suo, D., Walker, E., Rodriguez, A., Xiao, J.: Multi-view self-supervised deep learning for 6D pose estimation in the amazon picking challenge. In: ICRA (2017)
Google Scholar
Schwarz, M., Milan, A., Periyasamy, A.S., Behnke, S.: Rgb-d object detection and semantic segmentation for autonomous manipulation in clutter. Int. J. Robot. Res. 37(4–5), 437–451 (2018)
Article Google Scholar
Richter, S.R., Hayder, Z., Koltun, V.: Playing for benchmarks. In: ICCV (2017)
Google Scholar
Pepik, B., Stark, M., Gehler, P., Schiele, B.: Multi-view and 3D deformable part models. IEEE Trans. Pattern Anal. Mach. Intell. 37(11), 2232–2245 (2015)
Article Google Scholar
Varol, G., Romero, J., Martin, X., Mahmood, N., Black, M.J., Laptev, I., Schmid, C.: Learning from synthetic humans. In: CVPR (2017)
Google Scholar
Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: ICCV (2015)
Google Scholar
Sadeghi, F., Levine, S.: (CAD)$^2$RL: real single-image flight without a single real image. In: RSS (2017)
Google Scholar
Li, S., Liu, T., Zhang, C., Yeung, D.Y., Shen, S.: Learning unmanned aerial vehicle control for autonomous target following. In: IJCAI (2018)
Google Scholar
Blender. http://www.blender.org/
Calli, B., Singh, A., Walsman, A., Srinivasa, S., Abbeel, P., Dollar, A.M.: The YCB object and model set: towards common benchmarks for manipulation research. In: ICAR (2015)
Google Scholar
Bullet. http://bulletphysics.org/
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)
Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: ECCV (2014)
Google Scholar

Download references

Acknowledgments

This research has been supported by General Research Fund 16207316 from the Research Grants Council of Hong Kong.

Author information

Authors and Affiliations

The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
Siyi Li & Dit-Yan Yeung
Carnegie Mellon University, Pittsburgh, USA
Jiaji Zhou, Zhenzhong Jia & Matthew T. Mason

Authors

Siyi Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiaji Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Zhenzhong Jia
View author publications
You can also search for this author in PubMed Google Scholar
Dit-Yan Yeung
View author publications
You can also search for this author in PubMed Google Scholar
Matthew T. Mason
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Siyi Li .

Editor information

Editors and Affiliations

Robotics Engineering, Worcester Polytechnic Institute, Worcester, MA, USA
Jing Xiao
Karlsruhe Institute of Technology, Karlsruhe, Baden-Württemberg, Germany
Torsten Kröger
Department of Computer Science, Stanford University, Stanford, CA, USA
Oussama Khatib

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 2555 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, S., Zhou, J., Jia, Z., Yeung, DY., Mason, M.T. (2020). Learning Accurate Objectness Instance Segmentation from Photorealistic Rendering for Robotic Manipulation. In: Xiao, J., Kröger, T., Khatib, O. (eds) Proceedings of the 2018 International Symposium on Experimental Robotics. ISER 2018. Springer Proceedings in Advanced Robotics, vol 11. Springer, Cham. https://doi.org/10.1007/978-3-030-33950-0_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-33950-0_22
Published: 23 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33949-4
Online ISBN: 978-3-030-33950-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics