Skip to main content

Object Segmentation in Depth Maps with One User Click and a Synthetically Trained Fully Convolutional Network

  • Conference paper
  • First Online:
Human Friendly Robotics

Part of the book series: Springer Proceedings in Advanced Robotics ((SPAR,volume 7))

  • 649 Accesses

Abstract

With more and more household objects built on planned obsolescence and consumed by a fast-growing population, hazardous waste recycling has become a critical challenge. Given the large variability of household waste, current recycling platforms mostly rely on human operators to analyze the scene, typically composed of many object instances piled up in bulk. Helping them by robotizing the unitary extraction is a key challenge to speed up this tedious process. Whereas supervised deep learning has proven very efficient for such object-level scene understanding, e.g., generic object detection and segmentation in everyday scenes, it however requires large sets of per-pixel labeled images, that are hardly available for numerous application contexts, including industrial robotics. We thus propose a step towards a practical interactive application for generating an object-oriented robotic grasp, requiring as inputs only one depth map of the scene and one user click on the next object to extract. More precisely, we address in this paper the middle issue of object segmentation in top views of piles of bulk objects given a pixel location, namely seed, provided interactively by a human operator. We propose a two-fold framework for generating edge-driven instance segments. First, we repurpose a state-of-the-art fully convolutional object contour detector for seed-based instance segmentation by introducing the notion of edge-mask duality with a novel patch-free and contour-oriented loss function. Second, we train one model using only synthetic scenes, instead of manually labeled training data. Our experimental results show that considering edge-mask duality for training an encoder-decoder network, as we suggest, outperforms a state-of-the-art patch-based network in the present application context.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Arbeláez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 33(5), 898–916 (2011)

    Article  Google Scholar 

  2. Blender—A 3D Modelling and Rendering Package. Blender Foundation, Blender Institute, Amsterdam (2016)

    Google Scholar 

  3. Chang, A.X., Funkhouser, T.A., Guibas, L.J., Hanrahan, P., Huang, Q.-X., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., Yu, F.: ShapeNet: an information-rich 3D model repository (2015). CoRR arXiv:abs/1512.03012

  4. Choi, C., Taguchi, Y., Tuzel, O., Liu, M.-Y., Ramalingam, S.: Voting-based pose estimation for robotic assembly using a 3D sensor. In: ICRA, pp. 1724–1731. IEEE (2012)

    Google Scholar 

  5. Dollár, P., Zitnick, C.L.: Fast edge detection using structured forests (2014). CoRR arXiv:abs/1406.5549

  6. Dosovitskiy, A., Fischer, P., Ilg, E., Husser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D., Brox, T.: FlowNet: learning optical flow with convolutional networks. In: ICCV, pp. 2758–2766. IEEE Computer Society (2015)

    Google Scholar 

  7. Firman, M.: RGBD datasets: past, present and future. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2016

    Google Scholar 

  8. Handa, A., Patraucean, V., Badrinarayanan, V., Stent, S., Cipolla, R.: Understanding realworld indoor scenes with synthetic data. In: CVPR, pp. 4077–4085. IEEE Computer Society (2016)

    Google Scholar 

  9. Handa, A., Patraucean, V., Stent, S., Cipolla, R.: SceneNet: an annotated model generator for indoor scene understanding. In: Kragic, D., Bicchi, A., Luca, A.D. (eds.) ICRA, pp. 5737–5743. IEEE (2016)

    Google Scholar 

  10. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask-RCNN. In: ICCV (2017)

    Google Scholar 

  11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778. IEEE Computer Society (2016)

    Google Scholar 

  12. Hosang, J.H., Benenson, R., Dollr, P., Schiele, B.: What makes for effective detection proposals? (2015). CoRR arXiv:abs/1502.05082

  13. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding (2014). arXiv:1408.5093

  14. Krahenbuhl, P., Koltun, V.: Learning to propose objects. In: CVPR, pp. 1574–1582. IEEE (2015)

    Google Scholar 

  15. Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: ICRA, pp. 1817–1824. IEEE (2011)

    Google Scholar 

  16. Lin, G., Shen, C., van den Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: CVPR, pp. 3194–3203. IEEE Computer Society (2016)

    Google Scholar 

  17. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D.J., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV (5). Lecture Notes in Computer Science, vol. 8693, pp. 740–755. Springer (2014)

    Google Scholar 

  18. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440. IEEE (2015)

    Google Scholar 

  19. Mayer, N., Ilg, E., Husser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR, pp. 4040–4048. IEEE Computer Society (2016)

    Google Scholar 

  20. Peng, X., Sun, B., Ali, K., Saenko, K.: Learning deep object detectors from 3D models. In: ICCV, pp. 1278–1286. IEEE Computer Society (2015)

    Google Scholar 

  21. Pinheiro, P.H.O., Collobert, R., Dollár, P.: Learning to segment object candidates. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) NIPS, pp. 1990–1998 (2015)

    Google Scholar 

  22. Pinheiro, P.O., Lin, T.-Y., Collobert, R., Dollár, P.: Learning to refine object segments. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV (1). Lecture Notes in Computer Science, vol. 9905, pp. 75–91. Springer (2016)

    Chapter  Google Scholar 

  23. Pont-Tuset, J., Arbelaez, P., Barron, J.T., Marqus, F., Malik, J.: Multiscale combinatorial grouping for image segmentation and object proposal generation (2015). CoRR arXiv:abs/1503.00848

  24. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) NIPS, pp. 91–99 (2015)

    Google Scholar 

  25. Richtsfeld, A., Morwald, T., Prankl, J., Zillich, M., Vincze, M.: Segmentation of unknown objects in Indoor environments. In: IROS, pp. 4791–4796. IEEE (2012)

    Google Scholar 

  26. Romera-Paredes, B., Torr, P.H.S.: Recurrent instance segmentation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV (6). Lecture Notes in Computer Science, vol. 9910, pp. 312–329. Springer (2016)

    Chapter  Google Scholar 

  27. Ros, G., Sellart, L., Materzynska, J., Vzquez, D., Lopez, A.M.: The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: CVPR, pp. 3234–3243. IEEE Computer Society (2016)

    Google Scholar 

  28. Rozantsev, A., Lepetit, V., Fua, P.: On rendering synthetic images for training an object detector. Comput. Vis. Image Underst. 137, 24–37 (2015)

    Article  Google Scholar 

  29. Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A.W., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV (5). Lecture Notes in Computer Science, vol. 7576, pp. 746–760. Springer (2012)

    Chapter  Google Scholar 

  30. Yang, J., Price, B.L., Cohen, S., Lee, H., Yang, M.-H.: Object contour detection with a fully convolutional encoder-decoder network. In: CVPR, pp. 193–202. IEEE Computer Society (2016)

    Google Scholar 

  31. Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D.J., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV (5). Lecture Notes in Computer Science, vol. 8693, pp. 391–405. Springer (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthieu Grard .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Grard, M., Brégier, R., Sella, F., Dellandréa, E., Chen, L. (2019). Object Segmentation in Depth Maps with One User Click and a Synthetically Trained Fully Convolutional Network. In: Ficuciello, F., Ruggiero, F., Finzi, A. (eds) Human Friendly Robotics. Springer Proceedings in Advanced Robotics, vol 7. Springer, Cham. https://doi.org/10.1007/978-3-319-89327-3_16

Download citation

Publish with us

Policies and ethics