Skip to main content

Generating 2.5D Photorealistic Synthetic Datasets for Training Machine Vision Algorithms

  • Conference paper
  • First Online:
Book cover 15th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2020) (SOCO 2020)

Abstract

The continued success of deep convolution neural networks (CNN) in computer vision can be directly linked to vast amounts of data and tremendous processing resources for training such non-linear models. However, depending on the task, the available amount of data varies significantly. Particularly robotic systems usually rely on small amounts of data, as producing and annotating them is extremely robot and task specific (e.g. grasping) and therefore prohibitive. Recently, in order to address the aforementioned problem of small datasets in robotic vision, a common practice is to reuse features that are already learned by a CNN within a large-scale task and apply them to different small scale ones. This transfer of learning shows some promising results as an alternative, but nevertheless it can not be compared with the performance of a CNN that is specifically trained from the beginning for that specific task. Thus, many researchers turned to synthetic datasets for training, since they can be produced easily and cost effectively. The main issue of such datasets that already exist, is the lack of photorealism both in terms of background and lighting. Herein, we are proposing a framework for the generation of completely synthetic datasets that includes all types of data that state-of-the-art algorithms in object recognition, and tracking need for their training. Thus, we can improve robotic perception without deploying the robot in time-consuming real-world scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    An example dataset generated using the proposed framework will be publicly available upon the publication of the paper at hand.

References

  1. 3DFZephyr (2020). https://www.3dflow.net/3df-zephyr-pro-3d-models-from-photos/. Accessed 30 Apr 2020

  2. Blender Online Community: Blender - a 3D modelling and rendering package. Stichting Blender Foundation, Amsterdam (2018). http://www.blender.org. Accessed 30 Apr 2020

  3. HdriHaven (2020). https://hdrihaven.com/. Accessed 30 April 2020

  4. Orbec: Orbec structured light camera (2020). https://orbbec3d.com/product-astra-pro/. Accessed 30 Apr 2020

  5. Agarwal, A., Triggs, B.: A local basis representation for estimating human pose from cluttered images. In: Asian Conference on Computer Vision, pp. 50–59. Springer (2006)

    Google Scholar 

  6. Browatzki, B., Fischer, J., Graf, B., Bülthoff, H.H., Wallraven, C.: Going into depth: evaluating 2D and 3D cues for object classification on a new, large-scale object dataset. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1189–1195. IEEE (2011)

    Google Scholar 

  7. Chetverikov, D., Stepanov, D., Krsek, P.: Robust Euclidean alignment of 3D point sets: the trimmed iterative closest point algorithm. Image Vis. Comput. 23(3), 299–309 (2005)

    Article  Google Scholar 

  8. Freedman, B., Shpunt, A., Machline, M., Arieli, Y.: Depth mapping using projected patterns, 23 July 2013, US Patent 8,493,496

    Google Scholar 

  9. Keselman, L., Iselin Woodfill, J., Grunnet-Jepsen, A., Bhowmik, A.: Intel realsense stereoscopic depth cameras. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–10 (2017)

    Google Scholar 

  10. Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: 2011 IEEE International Conference on Robotics and Automation, pp. 1817–1824. IEEE (2011)

    Google Scholar 

  11. Mariolis, I., Peleka, G., Kargakos, A., Malassiotis, S.: Pose and category recognition of highly deformable objects using deep learning. In: 2015 International Conference on Advanced Robotics (ICAR), pp. 655–662. IEEE (2015)

    Google Scholar 

  12. Michels, J., Saxena, A., Ng, A.Y.: High speed obstacle avoidance using monocular vision and reinforcement learning. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 593–600 (2005)

    Google Scholar 

  13. Pollefeys, M., Gool, L.V.: From images to 3D models. Commun. ACM 45(7), 50–55 (2002)

    Article  Google Scholar 

  14. Saxena, A., Driemeyer, J., Ng, A.Y.: Robotic grasping of novel objects using vision. Int. J. Robot. Res. 27(2), 157–173 (2008)

    Article  Google Scholar 

  15. Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)

    Google Scholar 

Download references

Acknowledgement

This work has been supported by the European Union’s Horizon 2020 research and innovation programme funded project namely: “Co-production CeLL performing Human-Robot Collaborative AssEmbly (CoLLaboratE)” under the grant agreement with no: 820767.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georgia Peleka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Peleka, G., Mariolis, I., Tzovaras, D. (2021). Generating 2.5D Photorealistic Synthetic Datasets for Training Machine Vision Algorithms. In: Herrero, Á., Cambra, C., Urda, D., Sedano, J., Quintián, H., Corchado, E. (eds) 15th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2020). SOCO 2020. Advances in Intelligent Systems and Computing, vol 1268. Springer, Cham. https://doi.org/10.1007/978-3-030-57802-2_61

Download citation

Publish with us

Policies and ethics