Object Segmentation in Depth Maps with One User Click and a Synthetically Trained Fully Convolutional Network

Grard, Matthieu; Brégier, Romain; Sella, Florian; Dellandréa, Emmanuel; Chen, Liming

doi:10.1007/978-3-319-89327-3_16

Matthieu Grard^13,14,
Romain Brégier^13,15,
Florian Sella¹³,
Emmanuel Dellandréa¹⁴ &
…
Liming Chen¹⁴

Part of the book series: Springer Proceedings in Advanced Robotics ((SPAR,volume 7))

649 Accesses

Abstract

With more and more household objects built on planned obsolescence and consumed by a fast-growing population, hazardous waste recycling has become a critical challenge. Given the large variability of household waste, current recycling platforms mostly rely on human operators to analyze the scene, typically composed of many object instances piled up in bulk. Helping them by robotizing the unitary extraction is a key challenge to speed up this tedious process. Whereas supervised deep learning has proven very efficient for such object-level scene understanding, e.g., generic object detection and segmentation in everyday scenes, it however requires large sets of per-pixel labeled images, that are hardly available for numerous application contexts, including industrial robotics. We thus propose a step towards a practical interactive application for generating an object-oriented robotic grasp, requiring as inputs only one depth map of the scene and one user click on the next object to extract. More precisely, we address in this paper the middle issue of object segmentation in top views of piles of bulk objects given a pixel location, namely seed, provided interactively by a human operator. We propose a two-fold framework for generating edge-driven instance segments. First, we repurpose a state-of-the-art fully convolutional object contour detector for seed-based instance segmentation by introducing the notion of edge-mask duality with a novel patch-free and contour-oriented loss function. Second, we train one model using only synthetic scenes, instead of manually labeled training data. Our experimental results show that considering edge-mask duality for training an encoder-decoder network, as we suggest, outperforms a state-of-the-art patch-based network in the present application context.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Hardcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cut-paste image generation for instance segmentation for robotic picking of industrial parts

Article Open access 23 November 2023

Self-supervised Interactive Object Segmentation Through a Singulation-and-Grasping Approach

Semantic RGB-D Perception for Cognitive Service Robots

References

Arbeláez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 33(5), 898–916 (2011)
Article Google Scholar
Blender—A 3D Modelling and Rendering Package. Blender Foundation, Blender Institute, Amsterdam (2016)
Google Scholar
Chang, A.X., Funkhouser, T.A., Guibas, L.J., Hanrahan, P., Huang, Q.-X., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., Yu, F.: ShapeNet: an information-rich 3D model repository (2015). CoRR arXiv:abs/1512.03012
Choi, C., Taguchi, Y., Tuzel, O., Liu, M.-Y., Ramalingam, S.: Voting-based pose estimation for robotic assembly using a 3D sensor. In: ICRA, pp. 1724–1731. IEEE (2012)
Google Scholar
Dollár, P., Zitnick, C.L.: Fast edge detection using structured forests (2014). CoRR arXiv:abs/1406.5549
Dosovitskiy, A., Fischer, P., Ilg, E., Husser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D., Brox, T.: FlowNet: learning optical flow with convolutional networks. In: ICCV, pp. 2758–2766. IEEE Computer Society (2015)
Google Scholar
Firman, M.: RGBD datasets: past, present and future. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2016
Google Scholar
Handa, A., Patraucean, V., Badrinarayanan, V., Stent, S., Cipolla, R.: Understanding realworld indoor scenes with synthetic data. In: CVPR, pp. 4077–4085. IEEE Computer Society (2016)
Google Scholar
Handa, A., Patraucean, V., Stent, S., Cipolla, R.: SceneNet: an annotated model generator for indoor scene understanding. In: Kragic, D., Bicchi, A., Luca, A.D. (eds.) ICRA, pp. 5737–5743. IEEE (2016)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask-RCNN. In: ICCV (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778. IEEE Computer Society (2016)
Google Scholar
Hosang, J.H., Benenson, R., Dollr, P., Schiele, B.: What makes for effective detection proposals? (2015). CoRR arXiv:abs/1502.05082
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding (2014). arXiv:1408.5093
Krahenbuhl, P., Koltun, V.: Learning to propose objects. In: CVPR, pp. 1574–1582. IEEE (2015)
Google Scholar
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: ICRA, pp. 1817–1824. IEEE (2011)
Google Scholar
Lin, G., Shen, C., van den Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: CVPR, pp. 3194–3203. IEEE Computer Society (2016)
Google Scholar
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D.J., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV (5). Lecture Notes in Computer Science, vol. 8693, pp. 740–755. Springer (2014)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440. IEEE (2015)
Google Scholar
Mayer, N., Ilg, E., Husser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR, pp. 4040–4048. IEEE Computer Society (2016)
Google Scholar
Peng, X., Sun, B., Ali, K., Saenko, K.: Learning deep object detectors from 3D models. In: ICCV, pp. 1278–1286. IEEE Computer Society (2015)
Google Scholar
Pinheiro, P.H.O., Collobert, R., Dollár, P.: Learning to segment object candidates. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) NIPS, pp. 1990–1998 (2015)
Google Scholar
Pinheiro, P.O., Lin, T.-Y., Collobert, R., Dollár, P.: Learning to refine object segments. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV (1). Lecture Notes in Computer Science, vol. 9905, pp. 75–91. Springer (2016)
Chapter Google Scholar
Pont-Tuset, J., Arbelaez, P., Barron, J.T., Marqus, F., Malik, J.: Multiscale combinatorial grouping for image segmentation and object proposal generation (2015). CoRR arXiv:abs/1503.00848
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) NIPS, pp. 91–99 (2015)
Google Scholar
Richtsfeld, A., Morwald, T., Prankl, J., Zillich, M., Vincze, M.: Segmentation of unknown objects in Indoor environments. In: IROS, pp. 4791–4796. IEEE (2012)
Google Scholar
Romera-Paredes, B., Torr, P.H.S.: Recurrent instance segmentation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV (6). Lecture Notes in Computer Science, vol. 9910, pp. 312–329. Springer (2016)
Chapter Google Scholar
Ros, G., Sellart, L., Materzynska, J., Vzquez, D., Lopez, A.M.: The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: CVPR, pp. 3234–3243. IEEE Computer Society (2016)
Google Scholar
Rozantsev, A., Lepetit, V., Fua, P.: On rendering synthetic images for training an object detector. Comput. Vis. Image Underst. 137, 24–37 (2015)
Article Google Scholar
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A.W., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV (5). Lecture Notes in Computer Science, vol. 7576, pp. 746–760. Springer (2012)
Chapter Google Scholar
Yang, J., Price, B.L., Cohen, S., Lee, H., Yang, M.-H.: Object contour detection with a fully convolutional encoder-decoder network. In: CVPR, pp. 193–202. IEEE Computer Society (2016)
Google Scholar
Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D.J., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV (5). Lecture Notes in Computer Science, vol. 8693, pp. 391–405. Springer (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Siléane, 17 Rue Descartes, 42000, St Étienne, France
Matthieu Grard, Romain Brégier & Florian Sella
Université de Lyon, CNRS, École Centrale de Lyon, LIRIS UMR5205, 69134, Lyon, France
Matthieu Grard, Emmanuel Dellandréa & Liming Chen
Université Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, 38000, Grenoble, France
Romain Brégier

Authors

Matthieu Grard
View author publications
You can also search for this author in PubMed Google Scholar
Romain Brégier
View author publications
You can also search for this author in PubMed Google Scholar
Florian Sella
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Dellandréa
View author publications
You can also search for this author in PubMed Google Scholar
Liming Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthieu Grard .

Editor information

Editors and Affiliations

Dipartimento di Ingegneria Elettrica e Tecnologie dell’Informazione, Università degli Studi di Napoli Federico II, Napoli, Italy
Fanny Ficuciello
Dipartimento di Ingegneria Elettrica e Tecnologie dell’Informazione, Università degli Studi di Napoli Federico II, Napoli, Italy
Fabio Ruggiero
Dipartimento di Ingegneria Elettrica e Tecnologie dell’Informazione, Università degli Studi di Napoli Federico II, Napoli, Italy
Alberto Finzi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grard, M., Brégier, R., Sella, F., Dellandréa, E., Chen, L. (2019). Object Segmentation in Depth Maps with One User Click and a Synthetically Trained Fully Convolutional Network. In: Ficuciello, F., Ruggiero, F., Finzi, A. (eds) Human Friendly Robotics. Springer Proceedings in Advanced Robotics, vol 7. Springer, Cham. https://doi.org/10.1007/978-3-319-89327-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-89327-3_16
Published: 02 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-89326-6
Online ISBN: 978-3-319-89327-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics