Abstract
Recent progress in computer vision has been driven by high-capacity deep convolutional neural network (CNN) models trained on generic large datasets. However, creating large datasets with dense pixel-level labels is extremely costly. In this paper, we focus on the problem of instance segmentation for robotic manipulation using rich image and depth features. To avoid intensive human labeling, we develop an automated rendering pipeline for rapidly generating labeled datasets. Given 3D object models as input, the rendering pipeline produces photorealistic images with pixel-accurate semantic label maps and depth maps. The synthetic dataset is then used to train an RGB-D segmentation model by extending the Mask R-CNN framework for depth input fusion. Our results open up new possibilities for advancing robotic perception using cheap and large-scale synthetic data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Zhou, J., Paolini, R., Johnson, A.M., Bagnell, J.A., Mason, M.T.: A probabilistic planning framework for planar grasping under uncertainty. IEEE Robot. Autom. Lett. 2(4), 2111–2118 (2017)
Mitash, C., Bekris, K.E., Boularias, A.: A self-supervised learning system for object detection using physics simulation and multi-view pose estimation. In: IROS (2017)
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: IROS (2017)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
Zeng, A., Yu, K.T., Song, S., Suo, D., Walker, E., Rodriguez, A., Xiao, J.: Multi-view self-supervised deep learning for 6D pose estimation in the amazon picking challenge. In: ICRA (2017)
Schwarz, M., Milan, A., Periyasamy, A.S., Behnke, S.: Rgb-d object detection and semantic segmentation for autonomous manipulation in clutter. Int. J. Robot. Res. 37(4–5), 437–451 (2018)
Richter, S.R., Hayder, Z., Koltun, V.: Playing for benchmarks. In: ICCV (2017)
Pepik, B., Stark, M., Gehler, P., Schiele, B.: Multi-view and 3D deformable part models. IEEE Trans. Pattern Anal. Mach. Intell. 37(11), 2232–2245 (2015)
Varol, G., Romero, J., Martin, X., Mahmood, N., Black, M.J., Laptev, I., Schmid, C.: Learning from synthetic humans. In: CVPR (2017)
Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: ICCV (2015)
Sadeghi, F., Levine, S.: (CAD)\(^2\)RL: real single-image flight without a single real image. In: RSS (2017)
Li, S., Liu, T., Zhang, C., Yeung, D.Y., Shen, S.: Learning unmanned aerial vehicle control for autonomous target following. In: IJCAI (2018)
Blender. http://www.blender.org/
Calli, B., Singh, A., Walsman, A., Srinivasa, S., Abbeel, P., Dollar, A.M.: The YCB object and model set: towards common benchmarks for manipulation research. In: ICAR (2015)
Bullet. http://bulletphysics.org/
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: ECCV (2014)
Acknowledgments
This research has been supported by General Research Fund 16207316 from the Research Grants Council of Hong Kong.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 1 (mp4 2555 KB)
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, S., Zhou, J., Jia, Z., Yeung, DY., Mason, M.T. (2020). Learning Accurate Objectness Instance Segmentation from Photorealistic Rendering for Robotic Manipulation. In: Xiao, J., Kröger, T., Khatib, O. (eds) Proceedings of the 2018 International Symposium on Experimental Robotics. ISER 2018. Springer Proceedings in Advanced Robotics, vol 11. Springer, Cham. https://doi.org/10.1007/978-3-030-33950-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-33950-0_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33949-4
Online ISBN: 978-3-030-33950-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)