GUILD - A Generator for Usable Images in Large-Scale Datasets

Roch, Peter; Shahbaz Nejad, Bijan; Handte, Marcus; Marrón, Pedro José

doi:10.1007/978-3-031-20716-7_19

Peter Roch¹⁶,
Bijan Shahbaz Nejad¹⁶,
Marcus Handte¹⁶ &
…
Pedro José Marrón¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13599))

Included in the following conference series:

International Symposium on Visual Computing

561 Accesses

Abstract

Large image datasets are important for many different aspects of computer vision. However, creating datasets containing thousands or millions of labeled images is time consuming. Instead of manual collection of a large dataset, we propose a framework for generating large-scale datasets synthetically. Our framework is capable of generating realistic looking images with varying environmental conditions, while automatically creating labels. To evaluate usefulness of such a dataset, we generate two datasets containing vehicle images. Afterwards, we use these images to train a neural network. We then compare detection accuracy to the same neural network trained with images of existing datasets. The experiments show that our generated datasets are well-suited to train neural networks and achieve comparable accuracy to existing datasets containing real photographs, while they are much faster to create.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amazon Mechanical Turk Inc: Amazon Mechanical Turk. www.mturk.com (2018). (Accessed 09 Nov 2021)
Barz, B., Denzler, J.: Deep learning on small datasets without pre-training using cosine loss. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2020)
Google Scholar
Birhane, A., Prabhu, V.U.: Large image datasets: A pyrrhic win for computer vision? In: WACV (2021)
Google Scholar
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: Optimal speed and accuracy of object detection. CoRR (2020)
Google Scholar
Bryant, D., Howard, A.: A comparative analysis of emotion-detecting ai systems with respect to algorithm performance and dataset diversity. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (2019)
Google Scholar
Chang, A.X., et al.: Shapenet: An information-rich 3d model repository. CoRR (2015)
Google Scholar
Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., Bharath, A.A.: Generative adversarial networks: An overview. IEEE Signal Process. Mag. (2018)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: CVPR (2009)
Google Scholar
Doan, A.D., Jawaid, A.M., Do, T.T., Chin, T.J.: G2D: from GTA to data. CoRR (2018)
Google Scholar
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: An open urban driving simulator. CoRL (2017)
Google Scholar
Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: A retrospective. In: IJCV (2015)
Google Scholar
Flickr Inc: Flickr (2021). www.flickr.com. (Accessed: 08 Nov 2021)
Google LLC : Open Images V6 - Description (2020). www.storage.googleapis.com/openimages/web/factsfigures.html. (Accessed: 18 Jul 2022)
Gupta, A., Dollár, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR (2019)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network (2015)
Google Scholar
Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Tech. rep, UMass Amherst (2007)
Google Scholar
IAM, Universität Duisburg-Essen: Taxiladekonzept für Elektrotaxis im öffentlichen Raum (2022). https://talako.uni-due.de. (Accessed: 14 Jan 2022)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR (2019)
Google Scholar
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: CVPR (2020)
Google Scholar
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: ICCVW (2013)
Google Scholar
Kutsenko, D.: Rpg/fps game assets for pc/mobile (industrial set v2.0) (2021). https://assetstore.unity.com/packages/3d/environments/industrial/rpg-fps-game-assets-for-pc-mobile-industrial-set-v2-0-86679. (Accessed: 11 Jul 2022)
Kuznetsova, A., et al.: The open images dataset v4. In: IJCV (2020)
Google Scholar
Le, H.A., Mensink, T., Das, P., Karaoglu, S., Gevers, T.: Eden: Multimodal synthetic dataset of enclosed garden scenes. In: WACV (2021)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Maji, S., Rahtu, E., Kannala, J., Blaschko, M.B., Vedaldi, A.: Fine-grained visual classification of aircraft. CoRR (2013)
Google Scholar
Mousavi, M., Khanal, A., Estrada, R.: Ai playground: Unreal engine-based data ablation tool for deep learning. In: ISVC (2020)
Google Scholar
Olson, M., Wyner, A., Berk, R.: Modern neural networks generalize on small data sets. In: Advances in Neural Information Processing Systems (2018)
Google Scholar
Parkhi, O.M., Vedaldi, A., Zisserman, A., Jawahar, C.V.: Cats and dogs. In: CVPR (2012)
Google Scholar
Qiu, W., et al.: Unrealcv: Virtual worlds for computer vision. In: ACM MM (2017)
Google Scholar
Reddy, N.D., Vo, M., Narasimhan, S.G.: Occlusion-net: 2d/3d occluded keypoint localization using graph networks. In: CVPR (2019)
Google Scholar
Richter, S.R., Hayder, Z., Koltun, V.: Playing for benchmarks. In: ICCV (2017)
Google Scholar
Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_7
Chapter Google Scholar
Robust Vision Challenge: Robust Vision Challenge 2020 (2020). www.robustvision.net. (Accessed: 08 Nov 2021)
Roch, P., Shahbaz Nejad, B., Handte, M., Marrón, P.J.: Car pose estimation through wheel detection. In: ISVC (2021)
Google Scholar
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In: CVPR (2016)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. In: IJCV (2015)
Google Scholar
Shah, S., Dey, D., Lovett, C., Kapoor, A.: Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In: FSR (2017)
Google Scholar
Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views. In: 2015 IEEE International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Sun, C., Shrivastava, A., Singh, S., Gupta, A.: Revisiting unreasonable effectiveness of data in deep learning era. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Tafazzoli, F., Frigui, H., Nishiyama, K.: A large and diverse dataset for improved vehicle make and model recognition. In: CVPRW (2017)
Google Scholar
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2017)
Google Scholar
Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2008)
Google Scholar
Unity Technologies: Unity. www.unity.com (2021). (Accessed 07 Dec 2021)
Unity Technologies: Unity Asset Store. https://assetstore.unity.com (2021). (Accessed: 07 Dec 2021)
Weinzaepfel, P., Csurka, G., Cabon, Y., Humenberger, M.: Visual localization by learning objects-of-interest dense match regression. In: CVPR (2019)
Google Scholar
Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3d pose estimation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. In: TPAMI (2019)
Google Scholar
Yang, L., Luo, P., Loy, C.C., Tang, X.: A large-scale car dataset for fine-grained categorization and verification. In: CVPR (2015)
Google Scholar
Yang, S., Luo, P., Loy, C.C., Tang, X.: Wider face: A face detection benchmark. In: CVPR (2016)
Google Scholar
Zakharov, E., Shysheya, A., Burkov, E., Lempitsky, V.: Few-shot adversarial learning of realistic neural talking head models. In: ICCV (2019)
Google Scholar

Download references

Acknowledgment

This research is funded by the Bundesministerium für Wirtschaft und Energie as part of the TALAKO project [17] (grant number 01MZ19002A). The authors wish to thank Maximilian Fischer for implementing a prototypical system during his thesis.

Author information

Authors and Affiliations

University of Duisburg-Essen, Essen, Germany
Peter Roch, Bijan Shahbaz Nejad, Marcus Handte & Pedro José Marrón

Authors

Peter Roch
View author publications
You can also search for this author in PubMed Google Scholar
Bijan Shahbaz Nejad
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Handte
View author publications
You can also search for this author in PubMed Google Scholar
Pedro José Marrón
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Roch .

Editor information

Editors and Affiliations

University of Nevada, Reno, NV, USA
George Bebis
University of Illinois Urbana-Champaign, Urbana, IL, USA
Bo Li
National University of Singapore, Singapore, Singapore
Angela Yao
Microsoft Research Asia, Beijing, China
Yang Liu
University of Missouri, Columbia, MO, USA
Ye Duan
City University of Hong Kong, Kowloon, Hong Kong
Manfred Lau
Idaho National Laboratory, Idaho Falls, ID, USA
Rajiv Khadka
Salesforce, Seattle, WA, USA
Ana Crisan
Tufts University, Medford, MA, USA
Remco Chang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Roch, P., Shahbaz Nejad, B., Handte, M., Marrón, P.J. (2022). GUILD - A Generator for Usable Images in Large-Scale Datasets. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2022. Lecture Notes in Computer Science, vol 13599. Springer, Cham. https://doi.org/10.1007/978-3-031-20716-7_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-20716-7_19
Published: 10 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20715-0
Online ISBN: 978-3-031-20716-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

GUILD - A Generator for Usable Images in Large-Scale Datasets