Abstract
Large image datasets are important for many different aspects of computer vision. However, creating datasets containing thousands or millions of labeled images is time consuming. Instead of manual collection of a large dataset, we propose a framework for generating large-scale datasets synthetically. Our framework is capable of generating realistic looking images with varying environmental conditions, while automatically creating labels. To evaluate usefulness of such a dataset, we generate two datasets containing vehicle images. Afterwards, we use these images to train a neural network. We then compare detection accuracy to the same neural network trained with images of existing datasets. The experiments show that our generated datasets are well-suited to train neural networks and achieve comparable accuracy to existing datasets containing real photographs, while they are much faster to create.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amazon Mechanical Turk Inc: Amazon Mechanical Turk. www.mturk.com (2018). (Accessed 09 Nov 2021)
Barz, B., Denzler, J.: Deep learning on small datasets without pre-training using cosine loss. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2020)
Birhane, A., Prabhu, V.U.: Large image datasets: A pyrrhic win for computer vision? In: WACV (2021)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: Optimal speed and accuracy of object detection. CoRR (2020)
Bryant, D., Howard, A.: A comparative analysis of emotion-detecting ai systems with respect to algorithm performance and dataset diversity. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (2019)
Chang, A.X., et al.: Shapenet: An information-rich 3d model repository. CoRR (2015)
Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., Bharath, A.A.: Generative adversarial networks: An overview. IEEE Signal Process. Mag. (2018)
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: CVPR (2009)
Doan, A.D., Jawaid, A.M., Do, T.T., Chin, T.J.: G2D: from GTA to data. CoRR (2018)
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: An open urban driving simulator. CoRL (2017)
Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: A retrospective. In: IJCV (2015)
Flickr Inc: Flickr (2021). www.flickr.com. (Accessed: 08 Nov 2021)
Google LLC : Open Images V6 - Description (2020). www.storage.googleapis.com/openimages/web/factsfigures.html. (Accessed: 18 Jul 2022)
Gupta, A., Dollár, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR (2019)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network (2015)
Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Tech. rep, UMass Amherst (2007)
IAM, Universität Duisburg-Essen: Taxiladekonzept für Elektrotaxis im öffentlichen Raum (2022). https://talako.uni-due.de. (Accessed: 14 Jan 2022)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR (2019)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: CVPR (2020)
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: ICCVW (2013)
Kutsenko, D.: Rpg/fps game assets for pc/mobile (industrial set v2.0) (2021). https://assetstore.unity.com/packages/3d/environments/industrial/rpg-fps-game-assets-for-pc-mobile-industrial-set-v2-0-86679. (Accessed: 11 Jul 2022)
Kuznetsova, A., et al.: The open images dataset v4. In: IJCV (2020)
Le, H.A., Mensink, T., Das, P., Karaoglu, S., Gevers, T.: Eden: Multimodal synthetic dataset of enclosed garden scenes. In: WACV (2021)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Maji, S., Rahtu, E., Kannala, J., Blaschko, M.B., Vedaldi, A.: Fine-grained visual classification of aircraft. CoRR (2013)
Mousavi, M., Khanal, A., Estrada, R.: Ai playground: Unreal engine-based data ablation tool for deep learning. In: ISVC (2020)
Olson, M., Wyner, A., Berk, R.: Modern neural networks generalize on small data sets. In: Advances in Neural Information Processing Systems (2018)
Parkhi, O.M., Vedaldi, A., Zisserman, A., Jawahar, C.V.: Cats and dogs. In: CVPR (2012)
Qiu, W., et al.: Unrealcv: Virtual worlds for computer vision. In: ACM MM (2017)
Reddy, N.D., Vo, M., Narasimhan, S.G.: Occlusion-net: 2d/3d occluded keypoint localization using graph networks. In: CVPR (2019)
Richter, S.R., Hayder, Z., Koltun, V.: Playing for benchmarks. In: ICCV (2017)
Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_7
Robust Vision Challenge: Robust Vision Challenge 2020 (2020). www.robustvision.net. (Accessed: 08 Nov 2021)
Roch, P., Shahbaz Nejad, B., Handte, M., Marrón, P.J.: Car pose estimation through wheel detection. In: ISVC (2021)
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In: CVPR (2016)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. In: IJCV (2015)
Shah, S., Dey, D., Lovett, C., Kapoor, A.: Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In: FSR (2017)
Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views. In: 2015 IEEE International Conference on Computer Vision (ICCV) (2015)
Sun, C., Shrivastava, A., Singh, S., Gupta, A.: Revisiting unreasonable effectiveness of data in deep learning era. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)
Tafazzoli, F., Frigui, H., Nishiyama, K.: A large and diverse dataset for improved vehicle make and model recognition. In: CVPRW (2017)
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2017)
Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2008)
Unity Technologies: Unity. www.unity.com (2021). (Accessed 07 Dec 2021)
Unity Technologies: Unity Asset Store. https://assetstore.unity.com (2021). (Accessed: 07 Dec 2021)
Weinzaepfel, P., Csurka, G., Cabon, Y., Humenberger, M.: Visual localization by learning objects-of-interest dense match regression. In: CVPR (2019)
Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3d pose estimation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. In: TPAMI (2019)
Yang, L., Luo, P., Loy, C.C., Tang, X.: A large-scale car dataset for fine-grained categorization and verification. In: CVPR (2015)
Yang, S., Luo, P., Loy, C.C., Tang, X.: Wider face: A face detection benchmark. In: CVPR (2016)
Zakharov, E., Shysheya, A., Burkov, E., Lempitsky, V.: Few-shot adversarial learning of realistic neural talking head models. In: ICCV (2019)
Acknowledgment
This research is funded by the Bundesministerium für Wirtschaft und Energie as part of the TALAKO project [17] (grant number 01MZ19002A). The authors wish to thank Maximilian Fischer for implementing a prototypical system during his thesis.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Roch, P., Shahbaz Nejad, B., Handte, M., Marrón, P.J. (2022). GUILD - A Generator for Usable Images in Large-Scale Datasets. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2022. Lecture Notes in Computer Science, vol 13599. Springer, Cham. https://doi.org/10.1007/978-3-031-20716-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-20716-7_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20715-0
Online ISBN: 978-3-031-20716-7
eBook Packages: Computer ScienceComputer Science (R0)