Panoptic Segmentation in Industrial Environments Using Synthetic and Real Data

Quattrocchi, Camillo; Di Mauro, Daniele; Furnari, Antonino; Farinella, Giovanni Maria

doi:10.1007/978-3-031-06430-2_23

Camillo Quattrocchi¹²,
Daniele Di Mauro¹²,
Antonino Furnari^12,13 &
…
Giovanni Maria Farinella^12,13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13232))

Included in the following conference series:

International Conference on Image Analysis and Processing

Abstract

Being able to understand the relations between the user and the surrounding environment is instrumental to assist users in a worksite. For instance, understanding which objects a user is interacting with from images and video collected through a wearable device can be useful to inform the worker on the usage of specific objects in order to improve productivity and prevent accidents. Despite modern vision systems can rely on advanced algorithms for object detection, semantic and panoptic segmentation, these methods still require large quantities of domain-specific labeled data, which can be difficult to obtain in industrial scenarios. Motivated by this observation, we propose a pipeline which allows to generate synthetic images from 3D models of real environments and real objects. The generated images are automatically labeled and hence effortless to obtain. Exploiting the proposed pipeline, we generate a dataset comprising synthetic images automatically labeled for panoptic segmentation. This set is complemented by a small number of manually labeled real images for fine-tuning. Experiments show that the use of synthetic images allows to drastically reduce the number of real images needed to obtain reasonable panoptic segmentation performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

DatasetNeRF: Efficient 3D-Aware Data Factory with Generative Radiance Fields

Enhancing Object Detection Performance for Small Objects Through Synthetic Data Generation and Proportional Class-Balancing Technique: A Comparative Study in Industrial Scenarios

Pointly-Supervised Panoptic Segmentation

Notes

1.
https://ego4d-data.org/.
2.
The dataset is available at https://iplab.dmi.unict.it/ENIGMA_SEG/.
3.
https://github.com/facebookresearch/detectron2.
4.
https://matterport.com/.
5.
http://unity3d.com/unity/.
6.
https://www.blender.org/.
7.
https://www.artec3d.com/.

References

Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Article Google Scholar
Betancourt, A., Morerio, P., Regazzoni, C.S., Rauterberg, M.: The evolution of first person vision methods: a survey. IEEE Trans. Circuits Syst. Video Technol. 25(5), 744–760 (2015)
Article Google Scholar
Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. In: International Conference on 3D Vision (3DV), pp. 667–676 (2017)
Google Scholar
Chen, K., et al.: Hybrid task cascade for instance segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4974–4983 (2019)
Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar
Csurka, G.: A comprehensive survey on domain adaptation for visual applications. In: Csurka, G. (ed.) Domain Adaptation in Computer Vision Applications. ACVPR, pp. 1–35. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58347-1_1
Damen, D., et al.: The EPIC-KITCHENS dataset: collection, challenges and baselines. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 4125–4141 (2021)
Google Scholar
Di Benedetto, M., Meloni, E., Amato, G., Falchi, F., Gennaro, C.: Learning safety equipment detection using virtual worlds. In: International Conference on Content-Based Multimedia Indexing (CBMI), pp. 1–6. IEEE (2019)
Google Scholar
Dutta, A., Zisserman, A.: The VIA annotation software for images, audio and video. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 2276–2279 (2019)
Google Scholar
Fabbri, M., et al.: Motsynth: how can synthetic data help pedestrian detection and tracking? In: Proceedings of the IEEE International Conference on Computer Vision, pp. 10849–10859 (2021)
Google Scholar
Hu, Y.T., Chen, H.S., Hui, K., Huang, J.B., Schwing, A.G.: Sail-vos: semantic amodal instance level video object segmentation-a synthetic dataset and baselines. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3105–3115 (2019)
Google Scholar
Hu, Y.T., Wang, J., Yeh, R.A., Schwing, A.G.: Sail-vos 3d: a synthetic dataset and baselines for object detection and 3d mesh reconstruction from video data. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1418–1428 (2021)
Google Scholar
Hwang, J., Oh, S.W., Lee, J.Y., Han, B.: Exemplar-based open-set panoptic segmentation network. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1175–1184 (2021)
Google Scholar
Kirillov, A., He, K., Girshick, R., Rother, C., Dollar, P.: Panoptic segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9396–9405 (2019)
Google Scholar
Krähenbühl, P.: Free supervision from video games. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2955–2964 (2018)
Google Scholar
Lan, M., Zhang, Y., Zhang, L., Du, B.: Global context based automatic road segmentation via dilated convolutional neural network. Inf. Sci. 535, 156–171 (2020)
Article MathSciNet Google Scholar
Lateef, F., Ruichek, Y.: Survey on semantic segmentation using deep learning techniques. Neurocomputing 338, 321–348 (2019)
Article Google Scholar
Li, Y., Liu, M., Rehg, J.: In the eye of the beholder: Gaze and actions in first person video. IEEE Trans. Pattern Anal. Mach. Intell. (2021). https://ieeexplore.ieee.org/document/9325929
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Orlando, S.A., Furnari, A., Battiato, S., Farinella, G.M.: Image based localization with simulated egocentric navigations. In: International Conference on Computer Vision Theory and Applications, pp. 305–312 (2019)
Google Scholar
Pasqualino, G., Furnari, A., Signorello, G., Farinella, G.M.: An unsupervised domain adaptation scheme for single-stage artwork recognition in cultural sites. Image Vis. Comput. 107, 104098 (2021)
Article Google Scholar
Ragusa, F., Di Mauro, D., Palermo, A., Furnari, A., Farinella, G.M.: Semantic object segmentation in cultural sites using real and synthetic data. In: 25th International Conference on Pattern Recognition (ICPR), pp. 1964–1971 (2021)
Google Scholar
Ragusa, F., Furnari, A., Livatino, S., Farinella, G.M.: The meccano dataset: understanding human-object interactions from egocentric videos in an industrial-like domain. In: IEEE Winter Conference on Applications of Computer Vision, pp. 1569–1578 (2021)
Google Scholar
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019)
Google Scholar
Richter, S.R., Hayder, Z., Koltun, V.: Playing for benchmarks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2213–2222 (2017)
Google Scholar
Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: high-performance instance segmentation with box annotations. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5443–5452 (2021)
Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
Google Scholar

Download references

Acknowledgements

This research is supported by Next Vision (https://www.nextvisionlab.it/) s.r.l., and the project MEGABIT - PIAno di inCEntivi per la RIcerca di Ateneo 2020/2022 (PIACERI) - linea di intervento 2, DMI - University of Catania.

Author information

Authors and Affiliations

FPV@IPLAB, DMI - University of Catania, Catania, Italy
Camillo Quattrocchi, Daniele Di Mauro, Antonino Furnari & Giovanni Maria Farinella
Next Vision s.r.l. - Spinoff of the University of Catania, Catania, Italy
Antonino Furnari & Giovanni Maria Farinella

Authors

Camillo Quattrocchi
View author publications
You can also search for this author in PubMed Google Scholar
Daniele Di Mauro
View author publications
You can also search for this author in PubMed Google Scholar
Antonino Furnari
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Maria Farinella
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonino Furnari .

Editor information

Editors and Affiliations

Boston University, Boston, MA, USA
Stan Sclaroff
National Research Council, Lecce, Italy
Cosimo Distante
National Research Council, Lecce, Italy
Marco Leo
University of Catania, Catania, Italy
Giovanni M. Farinella
Technische Universität München, Garching, Germany
Federico Tombari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Quattrocchi, C., Di Mauro, D., Furnari, A., Farinella, G.M. (2022). Panoptic Segmentation in Industrial Environments Using Synthetic and Real Data. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing – ICIAP 2022. ICIAP 2022. Lecture Notes in Computer Science, vol 13232. Springer, Cham. https://doi.org/10.1007/978-3-031-06430-2_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-06430-2_23
Published: 17 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06429-6
Online ISBN: 978-3-031-06430-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Panoptic Segmentation in Industrial Environments Using Synthetic and Real Data