Skip to main content

Panoptic Segmentation in Industrial Environments Using Synthetic and Real Data

  • Conference paper
  • First Online:
Image Analysis and Processing – ICIAP 2022 (ICIAP 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13232))

Included in the following conference series:

Abstract

Being able to understand the relations between the user and the surrounding environment is instrumental to assist users in a worksite. For instance, understanding which objects a user is interacting with from images and video collected through a wearable device can be useful to inform the worker on the usage of specific objects in order to improve productivity and prevent accidents. Despite modern vision systems can rely on advanced algorithms for object detection, semantic and panoptic segmentation, these methods still require large quantities of domain-specific labeled data, which can be difficult to obtain in industrial scenarios. Motivated by this observation, we propose a pipeline which allows to generate synthetic images from 3D models of real environments and real objects. The generated images are automatically labeled and hence effortless to obtain. Exploiting the proposed pipeline, we generate a dataset comprising synthetic images automatically labeled for panoptic segmentation. This set is complemented by a small number of manually labeled real images for fine-tuning. Experiments show that the use of synthetic images allows to drastically reduce the number of real images needed to obtain reasonable panoptic segmentation performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://ego4d-data.org/.

  2. 2.

    The dataset is available at https://iplab.dmi.unict.it/ENIGMA_SEG/.

  3. 3.

    https://github.com/facebookresearch/detectron2.

  4. 4.

    https://matterport.com/.

  5. 5.

    http://unity3d.com/unity/.

  6. 6.

    https://www.blender.org/.

  7. 7.

    https://www.artec3d.com/.

References

  1. Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)

    Article  Google Scholar 

  2. Betancourt, A., Morerio, P., Regazzoni, C.S., Rauterberg, M.: The evolution of first person vision methods: a survey. IEEE Trans. Circuits Syst. Video Technol. 25(5), 744–760 (2015)

    Article  Google Scholar 

  3. Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. In: International Conference on 3D Vision (3DV), pp. 667–676 (2017)

    Google Scholar 

  4. Chen, K., et al.: Hybrid task cascade for instance segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4974–4983 (2019)

    Google Scholar 

  5. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)

    Article  Google Scholar 

  6. Csurka, G.: A comprehensive survey on domain adaptation for visual applications. In: Csurka, G. (ed.) Domain Adaptation in Computer Vision Applications. ACVPR, pp. 1–35. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58347-1_1

  7. Damen, D., et al.: The EPIC-KITCHENS dataset: collection, challenges and baselines. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 4125–4141 (2021)

    Google Scholar 

  8. Di Benedetto, M., Meloni, E., Amato, G., Falchi, F., Gennaro, C.: Learning safety equipment detection using virtual worlds. In: International Conference on Content-Based Multimedia Indexing (CBMI), pp. 1–6. IEEE (2019)

    Google Scholar 

  9. Dutta, A., Zisserman, A.: The VIA annotation software for images, audio and video. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 2276–2279 (2019)

    Google Scholar 

  10. Fabbri, M., et al.: Motsynth: how can synthetic data help pedestrian detection and tracking? In: Proceedings of the IEEE International Conference on Computer Vision, pp. 10849–10859 (2021)

    Google Scholar 

  11. Hu, Y.T., Chen, H.S., Hui, K., Huang, J.B., Schwing, A.G.: Sail-vos: semantic amodal instance level video object segmentation-a synthetic dataset and baselines. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3105–3115 (2019)

    Google Scholar 

  12. Hu, Y.T., Wang, J., Yeh, R.A., Schwing, A.G.: Sail-vos 3d: a synthetic dataset and baselines for object detection and 3d mesh reconstruction from video data. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1418–1428 (2021)

    Google Scholar 

  13. Hwang, J., Oh, S.W., Lee, J.Y., Han, B.: Exemplar-based open-set panoptic segmentation network. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1175–1184 (2021)

    Google Scholar 

  14. Kirillov, A., He, K., Girshick, R., Rother, C., Dollar, P.: Panoptic segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9396–9405 (2019)

    Google Scholar 

  15. Krähenbühl, P.: Free supervision from video games. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2955–2964 (2018)

    Google Scholar 

  16. Lan, M., Zhang, Y., Zhang, L., Du, B.: Global context based automatic road segmentation via dilated convolutional neural network. Inf. Sci. 535, 156–171 (2020)

    Article  MathSciNet  Google Scholar 

  17. Lateef, F., Ruichek, Y.: Survey on semantic segmentation using deep learning techniques. Neurocomputing 338, 321–348 (2019)

    Article  Google Scholar 

  18. Li, Y., Liu, M., Rehg, J.: In the eye of the beholder: Gaze and actions in first person video. IEEE Trans. Pattern Anal. Mach. Intell. (2021). https://ieeexplore.ieee.org/document/9325929

  19. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

  20. Orlando, S.A., Furnari, A., Battiato, S., Farinella, G.M.: Image based localization with simulated egocentric navigations. In: International Conference on Computer Vision Theory and Applications, pp. 305–312 (2019)

    Google Scholar 

  21. Pasqualino, G., Furnari, A., Signorello, G., Farinella, G.M.: An unsupervised domain adaptation scheme for single-stage artwork recognition in cultural sites. Image Vis. Comput. 107, 104098 (2021)

    Article  Google Scholar 

  22. Ragusa, F., Di Mauro, D., Palermo, A., Furnari, A., Farinella, G.M.: Semantic object segmentation in cultural sites using real and synthetic data. In: 25th International Conference on Pattern Recognition (ICPR), pp. 1964–1971 (2021)

    Google Scholar 

  23. Ragusa, F., Furnari, A., Livatino, S., Farinella, G.M.: The meccano dataset: understanding human-object interactions from egocentric videos in an industrial-like domain. In: IEEE Winter Conference on Applications of Computer Vision, pp. 1569–1578 (2021)

    Google Scholar 

  24. Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019)

    Google Scholar 

  25. Richter, S.R., Hayder, Z., Koltun, V.: Playing for benchmarks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2213–2222 (2017)

    Google Scholar 

  26. Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: high-performance instance segmentation with box annotations. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5443–5452 (2021)

    Google Scholar 

  27. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)

    Google Scholar 

Download references

Acknowledgements

This research is supported by Next Vision (https://www.nextvisionlab.it/) s.r.l., and the project MEGABIT - PIAno di inCEntivi per la RIcerca di Ateneo 2020/2022 (PIACERI) - linea di intervento 2, DMI - University of Catania.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonino Furnari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Quattrocchi, C., Di Mauro, D., Furnari, A., Farinella, G.M. (2022). Panoptic Segmentation in Industrial Environments Using Synthetic and Real Data. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing – ICIAP 2022. ICIAP 2022. Lecture Notes in Computer Science, vol 13232. Springer, Cham. https://doi.org/10.1007/978-3-031-06430-2_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06430-2_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06429-6

  • Online ISBN: 978-3-031-06430-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics