Skip to main content

Towards Visual Training Set Generation Framework

  • Conference paper
  • First Online:
Advances in Computational Intelligence (IWANN 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10306))

Included in the following conference series:

Abstract

Performance of trained computer vision algorithms is largely dependent on amounts of data, on which it is trained. Creating large labeled datasets is very expensive, and therefore many researchers use synthetically generated images with automatic annotations. To this purpose we have created a general framework, which allows researchers to generate practically infinite amount of images from a set of 3D models, textures and material settings. We leverage Voxel Cone Tracing technology implemented by NVIDIA to render photorealistic images in realtime without any kind of precomputation. We have build this framework with two use cases in mind: (i) for real world applications, where a database with synthetically generated images could compensate for small or non existent datasets, and (ii) for empirical testing of theoretical ideas by creating training sets with known inner structure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)

  2. Aubry, M., Russell, B.C.: Understanding deep features with computer-generated imagery. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2875–2883 (2015)

    Google Scholar 

  3. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)

  4. Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41–48. ACM (2009)

    Google Scholar 

  5. Bitterli, B., Rousselle, F., Moon, B., Iglesias-Guitián, J.A., Adler, D., Mitchell, K., Jarosz, W., Novák, J.: Nonlinearly weighted first-order regression for denoising monte carlo renderings. Comput. Graph. Forum 35, 107–117 (2016). Wiley Online Library

    Article  Google Scholar 

  6. Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., Krishnan, D.: Unsupervised pixel-level domain adaptation with generative adversarial networks. arXiv preprint arXiv:1612.05424 (2016)

  7. Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Hao, S., et al.: Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015)

  8. Crassin, C., Neyret, F., Sainz, M., Green, S., Eisemann, E.: Interactive indirect illumination using voxel cone tracing. Comput. Graph. Forum 30, 1921–1930 (2011). Wiley Online Library

    Article  Google Scholar 

  9. de Souza, C.R., Gaidon, A., Cabon, Y., López Peóa, A.M.: Procedural generation of videos to train deep action recognition networks. arXiv preprint arXiv:1612.00881 (2016)

  10. Gupta, S., Arbeláez, P., Girshick, R., Malik, J.: Aligning 3D models to RGB-D images of cluttered scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4731–4740 (2015)

    Google Scholar 

  11. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

    Google Scholar 

  12. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)

    Google Scholar 

  13. Hattori, H., Boddeti, V.N., Kitani, K.M., Kanade, T.: Learning scene-specific pedestrian detectors without real data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3819–3827 (2015)

    Google Scholar 

  14. Handa, A., Patraucean, V., Badrinarayanan, V., Stent, S., Cipolla, R.: Scenenet: Understanding real world indoor scenes with synthetic data. arXiv preprint arXiv:1511.07041 (2015)

  15. Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: Flownet 2.0: Evolution of optical flow estimation with deep networks. arXiv preprint arXiv:1612.01925 (2016)

  16. Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jaśkowski, W.: Vizdoom: A doom-based AI research platform for visual reinforcement learning. arXiv preprint arXiv:1605.02097 (2016)

  17. Lerer, A., Gross, S., Fergus, R.: Learning physical intuition of block towers by example. arXiv preprint arXiv:1603.01312 (2016)

  18. Lettry, L., Vanhoey, K., Van Gool, L.: Darn: a deep adversial residual network for intrinsic image decomposition. arXiv preprint arXiv:1612.07899 (2016)

  19. McCormac, J., Handa, A., Leutenegger, S., Davison, A.J.: Scenenet RGB-D: 5m photorealistic images of synthetic indoor trajectories with ground truth. arXiv preprint arXiv:1612.05079 (2016)

  20. Peng, X., Sun, B., Ali, K., Saenko, K.: Exploring invariances in deep convolutional neural networks using synthetic images. CoRR, abs/1412.7122 2(4) (2014)

    Google Scholar 

  21. Qiu, W., Yuille, A.: UnrealCV: connecting computer vision to unreal engine. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 909–916. Springer, Cham (2016). doi:10.1007/978-3-319-49409-8_75

    Google Scholar 

  22. Russakovsky, O., Deng, J., Hao, S., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  23. Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3234–3243 (2016)

    Google Scholar 

  24. Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016). doi:10.1007/978-3-319-46475-6_7

    Chapter  Google Scholar 

  25. Shah, S., Dey, D., Lovett, C., Kapoor, A.: Aerial informatics and robotics platform. Technical report MSR-TR-9, Microsoft Research (2017)

    Google Scholar 

  26. Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: a RGB-D scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 567–576 (2015)

    Google Scholar 

  27. Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. arXiv preprint arXiv:1612.07828 (2016)

  28. Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2686–2694 (2015)

    Google Scholar 

  29. Vazquez, D., Lopez, A.M., Marin, J., Ponsa, D., Geronimo, D.: Virtual and real world adaptation for pedestrian detection. IEEE Trans. Pattern Anal. Mach. Intell. 36(4), 797–809 (2014)

    Article  Google Scholar 

  30. Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: 21st International Conference on Pattern Recognition (ICPR), pp. 3304–3308. IEEE (2012)

    Google Scholar 

  31. Xu, J., Vázquez, D., López, A.M., Marin, J., Ponsa, D.: Learning a multiview part-based model in virtual world for pedestrian detection. In: IEEE Intelligent Vehicles Symposium (IV), pp. 467–472. IEEE (2013)

    Google Scholar 

  32. Yu, L.F., Yeung, S.K., Tang, C.K., Terzopoulos, D., Chan, T.F., Osher, S.J.: Make it home: automatic optimization of furniture arrangement (2011)

    Google Scholar 

  33. Yeh, Y.-T., Yang, L., Watson, M., Goodman, N.D., Hanrahan, P.: Synthesizing open worlds with constraints using locally annealed reversible jump MCMC. ACM Trans. Graph. (TOG) 31(4), 56 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Hůla .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Hůla, J., Perfilieva, I., Muzaheed, A.A.M. (2017). Towards Visual Training Set Generation Framework. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2017. Lecture Notes in Computer Science(), vol 10306. Springer, Cham. https://doi.org/10.1007/978-3-319-59147-6_63

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59147-6_63

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59146-9

  • Online ISBN: 978-3-319-59147-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics