The ETS2 Dataset, Synthetic Data from Video Games for Monocular Depth Estimation

María-Arribas, David; Cuesta-Infante, Alfredo; Pantrigo, Juan J.

doi:10.1007/978-3-031-36616-1_30

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14062))

Included in the following conference series:

Iberian Conference on Pattern Recognition and Image Analysis

Abstract

In this work, we present a new dataset for monocular depth estimation created by extracting images, dense depth maps, and odometer data from a realistic video game simulation, Euro Truck Simulator 2$^\textrm{TM}$. The dataset is used to train state-of-the-art depth estimation models in both supervised and unsupervised ways, which are evaluated against real-world sequences. Our results demonstrate that models trained exclusively with synthetic data achieve satisfactory performance in the real domain. The quantitative evaluation brings light to possible causes of domain gap in monocular depth estimation. Specifically, we discuss the effects of coarse-grained ground-truth depth maps in contrast to the fine-grained depth estimation. The dataset and code for data extraction and experiments are released open-source.

This research work has been supported by project TED2021-129162B-C22, funded by the Recovery and Resilience Facility program from the NextGenerationEU and the Spanish Research Agency (Agencia Estatal de Investigación); and PID2021-128362OB-I00, funded by the Spanish Plan for Scientific and Technical Research and Innovation of the Spanish Research Agency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning abs/1812.11941, arXiv:1812.11941 (2018)
Behley, J., et al.: SemanticKITTI: a dataset for semantic scene understanding of lidar sequences. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9296–9306 (2019). https://doi.org/10.1109/ICCV.2019.00939
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding (2016). https://doi.org/10.1109/CVPR.2016.350, www.cityscapes-dataset.net
Cvišić, I., Marković, I., Petrović, I.: Recalibrating the KITTI dataset camera setup for improved odometry accuracy, pp. 1–6 (2021). https://doi.org/10.1109/ECMR50962.2021.9568821
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848
Deschaud, J.E.: KITTI-carla: a kitti-like dataset generated by CARLA simulator (2021). https://doi.org/10.48550/arxiv.2109.00892, https://arxiv.org/abs/2109.00892
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator (2017)
Google Scholar
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture (2015). https://doi.org/10.1109/ICCV.2015.304
Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual worlds as proxy for multi-object tracking analysis, pp. 4340–4349 (2016)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3354–3361 (2012). https://doi.org/10.1109/CVPR.2012.6248074
Godard, C., Aodha, O.M., Firman, M., Brostow, G.: Digging into self-supervised monocular depth estimation (2018). https://doi.org/10.1109/ICCV.2019.00393
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2016)
Google Scholar
Hirschmüller, H.: Accurate and efficient stereo processing by semi-global matching and mutual information. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. II, pp. 807–814 (2005). https://doi.org/10.1109/CVPR.2005.56, https://researchcode.com/code/672268296/accurate-and-efficient-stereo-processing-by-semi-global-matching-and-mutual-information/
Hu, Y.T., Wang, J., Yeh, R., Schwing, A.: SAIL-VOS 3D: a synthetic dataset and baselines for object detection and 3D mesh reconstruction from video data, pp. 3359–3369 (2021). https://doi.org/10.1109/CVPRW53098.2021.00375
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks, pp. 2261–2269 (2017). https://doi.org/10.1109/CVPR.2017.243
Huang, Y., Dong, D., Lv, C.: Obtain datasets for self-driving perception from video games automatically. In: 12th International Conference on Reliability, Maintainability, and Safety (ICRMS), pp. 203–207 (2018). https://doi.org/10.1109/ICRMS.2018.00046
Rashed, H., Ramzy, M., Vaquero, V., El Sallab, A., Sistu, G., Yogamani, S.: FuseMODNet: real-time camera and LiDAR based moving object detection for robust low-light autonomous driving. In: The IEEE International Conference on Computer Vision (ICCV) Workshops (2019). https://doi.org/10.1109/ICCVW.2019.00293
Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_7
Chapter Google Scholar
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3234–3243 (2016). https://doi.org/10.1109/CVPR.2016.352
Saxena, A., Chung, S.H., Ng, A.: Learning depth from single monocular images. In: Advances in Neural Information Processing Systems, vol. 18 (2005). https://doi.org/10.5555/2976248.2976394
Saxena, A., Schulte, J., Ng, A.: Depth estimation using monocular and stereo cues. In: Proceedings of the 20th International joint conference on Artifical Intelligence (IJCAI) (2007). https://doi.org/10.5555/1625275.1625630
Silberman, N., Fergus, R.: Indoor scene segmentation using a structured light sensor. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, pp. 601–608 (2011). https://doi.org/10.1109/ICCVW.2011.6130298
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54, https://www.scinapse.io/papers/125693051

Download references

Author information

Authors and Affiliations

Computer Science and Statistics Department, Universidad Rey Juan Carlos, Calle Tulipán s/n, 28933, Móstoles, Madrid, Spain
David María-Arribas, Alfredo Cuesta-Infante & Juan J. Pantrigo

Authors

David María-Arribas
View author publications
You can also search for this author in PubMed Google Scholar
Alfredo Cuesta-Infante
View author publications
You can also search for this author in PubMed Google Scholar
Juan J. Pantrigo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David María-Arribas .

Editor information

Editors and Affiliations

University of Alicante, Alicante, Spain
Antonio Pertusa
University of Alicante, Alicante, Spain
Antonio Javier Gallego
Universitat Politècnica de València, Valencia, Spain
Joan Andreu Sánchez
IPO Porto, Coimbra, Portugal
Inês Domingues

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

María-Arribas, D., Cuesta-Infante, A., Pantrigo, J.J. (2023). The ETS2 Dataset, Synthetic Data from Video Games for Monocular Depth Estimation. In: Pertusa, A., Gallego, A.J., Sánchez, J.A., Domingues, I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2023. Lecture Notes in Computer Science, vol 14062. Springer, Cham. https://doi.org/10.1007/978-3-031-36616-1_30

Download citation

DOI: https://doi.org/10.1007/978-3-031-36616-1_30
Published: 25 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36615-4
Online ISBN: 978-3-031-36616-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

The ETS2 Dataset, Synthetic Data from Video Games for Monocular Depth Estimation