Skip to main content

GUILD - A Generator for Usable Images in Large-Scale Datasets

  • Conference paper
  • First Online:
Advances in Visual Computing (ISVC 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13599))

Included in the following conference series:

  • 561 Accesses

Abstract

Large image datasets are important for many different aspects of computer vision. However, creating datasets containing thousands or millions of labeled images is time consuming. Instead of manual collection of a large dataset, we propose a framework for generating large-scale datasets synthetically. Our framework is capable of generating realistic looking images with varying environmental conditions, while automatically creating labels. To evaluate usefulness of such a dataset, we generate two datasets containing vehicle images. Afterwards, we use these images to train a neural network. We then compare detection accuracy to the same neural network trained with images of existing datasets. The experiments show that our generated datasets are well-suited to train neural networks and achieve comparable accuracy to existing datasets containing real photographs, while they are much faster to create.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amazon Mechanical Turk Inc: Amazon Mechanical Turk. www.mturk.com (2018). (Accessed 09 Nov 2021)

  2. Barz, B., Denzler, J.: Deep learning on small datasets without pre-training using cosine loss. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2020)

    Google Scholar 

  3. Birhane, A., Prabhu, V.U.: Large image datasets: A pyrrhic win for computer vision? In: WACV (2021)

    Google Scholar 

  4. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: Optimal speed and accuracy of object detection. CoRR (2020)

    Google Scholar 

  5. Bryant, D., Howard, A.: A comparative analysis of emotion-detecting ai systems with respect to algorithm performance and dataset diversity. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (2019)

    Google Scholar 

  6. Chang, A.X., et al.: Shapenet: An information-rich 3d model repository. CoRR (2015)

    Google Scholar 

  7. Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., Bharath, A.A.: Generative adversarial networks: An overview. IEEE Signal Process. Mag. (2018)

    Google Scholar 

  8. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: CVPR (2009)

    Google Scholar 

  9. Doan, A.D., Jawaid, A.M., Do, T.T., Chin, T.J.: G2D: from GTA to data. CoRR (2018)

    Google Scholar 

  10. Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: An open urban driving simulator. CoRL (2017)

    Google Scholar 

  11. Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: A retrospective. In: IJCV (2015)

    Google Scholar 

  12. Flickr Inc: Flickr (2021). www.flickr.com. (Accessed: 08 Nov 2021)

  13. Google LLC : Open Images V6 - Description (2020). www.storage.googleapis.com/openimages/web/factsfigures.html. (Accessed: 18 Jul 2022)

  14. Gupta, A., Dollár, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR (2019)

    Google Scholar 

  15. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network (2015)

    Google Scholar 

  16. Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Tech. rep, UMass Amherst (2007)

    Google Scholar 

  17. IAM, Universität Duisburg-Essen: Taxiladekonzept für Elektrotaxis im öffentlichen Raum (2022). https://talako.uni-due.de. (Accessed: 14 Jan 2022)

  18. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR (2019)

    Google Scholar 

  19. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: CVPR (2020)

    Google Scholar 

  20. Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: ICCVW (2013)

    Google Scholar 

  21. Kutsenko, D.: Rpg/fps game assets for pc/mobile (industrial set v2.0) (2021). https://assetstore.unity.com/packages/3d/environments/industrial/rpg-fps-game-assets-for-pc-mobile-industrial-set-v2-0-86679. (Accessed: 11 Jul 2022)

  22. Kuznetsova, A., et al.: The open images dataset v4. In: IJCV (2020)

    Google Scholar 

  23. Le, H.A., Mensink, T., Das, P., Karaoglu, S., Gevers, T.: Eden: Multimodal synthetic dataset of enclosed garden scenes. In: WACV (2021)

    Google Scholar 

  24. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  25. Maji, S., Rahtu, E., Kannala, J., Blaschko, M.B., Vedaldi, A.: Fine-grained visual classification of aircraft. CoRR (2013)

    Google Scholar 

  26. Mousavi, M., Khanal, A., Estrada, R.: Ai playground: Unreal engine-based data ablation tool for deep learning. In: ISVC (2020)

    Google Scholar 

  27. Olson, M., Wyner, A., Berk, R.: Modern neural networks generalize on small data sets. In: Advances in Neural Information Processing Systems (2018)

    Google Scholar 

  28. Parkhi, O.M., Vedaldi, A., Zisserman, A., Jawahar, C.V.: Cats and dogs. In: CVPR (2012)

    Google Scholar 

  29. Qiu, W., et al.: Unrealcv: Virtual worlds for computer vision. In: ACM MM (2017)

    Google Scholar 

  30. Reddy, N.D., Vo, M., Narasimhan, S.G.: Occlusion-net: 2d/3d occluded keypoint localization using graph networks. In: CVPR (2019)

    Google Scholar 

  31. Richter, S.R., Hayder, Z., Koltun, V.: Playing for benchmarks. In: ICCV (2017)

    Google Scholar 

  32. Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_7

    Chapter  Google Scholar 

  33. Robust Vision Challenge: Robust Vision Challenge 2020 (2020). www.robustvision.net. (Accessed: 08 Nov 2021)

  34. Roch, P., Shahbaz Nejad, B., Handte, M., Marrón, P.J.: Car pose estimation through wheel detection. In: ISVC (2021)

    Google Scholar 

  35. Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In: CVPR (2016)

    Google Scholar 

  36. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. In: IJCV (2015)

    Google Scholar 

  37. Shah, S., Dey, D., Lovett, C., Kapoor, A.: Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In: FSR (2017)

    Google Scholar 

  38. Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views. In: 2015 IEEE International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  39. Sun, C., Shrivastava, A., Singh, S., Gupta, A.: Revisiting unreasonable effectiveness of data in deep learning era. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)

    Google Scholar 

  40. Tafazzoli, F., Frigui, H., Nishiyama, K.: A large and diverse dataset for improved vehicle make and model recognition. In: CVPRW (2017)

    Google Scholar 

  41. Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2017)

    Google Scholar 

  42. Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2008)

    Google Scholar 

  43. Unity Technologies: Unity. www.unity.com (2021). (Accessed 07 Dec 2021)

  44. Unity Technologies: Unity Asset Store. https://assetstore.unity.com (2021). (Accessed: 07 Dec 2021)

  45. Weinzaepfel, P., Csurka, G., Cabon, Y., Humenberger, M.: Visual localization by learning objects-of-interest dense match regression. In: CVPR (2019)

    Google Scholar 

  46. Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3d pose estimation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  47. Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. In: TPAMI (2019)

    Google Scholar 

  48. Yang, L., Luo, P., Loy, C.C., Tang, X.: A large-scale car dataset for fine-grained categorization and verification. In: CVPR (2015)

    Google Scholar 

  49. Yang, S., Luo, P., Loy, C.C., Tang, X.: Wider face: A face detection benchmark. In: CVPR (2016)

    Google Scholar 

  50. Zakharov, E., Shysheya, A., Burkov, E., Lempitsky, V.: Few-shot adversarial learning of realistic neural talking head models. In: ICCV (2019)

    Google Scholar 

Download references

Acknowledgment

This research is funded by the Bundesministerium für Wirtschaft und Energie as part of the TALAKO project [17] (grant number 01MZ19002A). The authors wish to thank Maximilian Fischer for implementing a prototypical system during his thesis.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Roch .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Roch, P., Shahbaz Nejad, B., Handte, M., Marrón, P.J. (2022). GUILD - A Generator for Usable Images in Large-Scale Datasets. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2022. Lecture Notes in Computer Science, vol 13599. Springer, Cham. https://doi.org/10.1007/978-3-031-20716-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20716-7_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20715-0

  • Online ISBN: 978-3-031-20716-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics