Skip to main content

Learning Typical 3D Representation from a Single 2D Correspondence Using 2D-3D Transformation Network

  • Conference paper
  • First Online:
Proceedings of the 13th International Conference on Ubiquitous Information Management and Communication (IMCOM) 2019 (IMCOM 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 935))

Abstract

Understanding 3D environment based on deep learning is the subject of interest in computer vision community due to its wide variety of applications. However, learning good 3D representation is a challenging problem due to many factors including high dimensionality of the data, modeling complexity in terms of symmetric objects, required computational costs and expected variations scale up by the order of 1.5 compared to learning 2D representations. In this paper we address the problem of typical 3D representation by transformation from its corresponding single 2D representation using Deep Autoencoder based Generate Adversarial Network termed as 2D-3D Transformation Network (2D-3D-TNET). The proposed model objective is based on traditional GAN loss along with the autoencoder loss, which allows the generator to effectively generate 3D objects corresponding to the input 2D images. Furthermore, instead of training the discriminator to discriminate real from generated image, we allow the discriminator to take both 2D image and 3D object as an input and learn to discriminate whether they are true correspondences or not. Thus, the discriminator learns and encodes the relationship between 3D objects and their corresponding projected 2D images. In addition, our model does not require labelled data for training and learns on unsupervised data. Experiments are conducted on ISRI_DB with real 3D daily-life objects as well as with the standard ModelNet40 dataset. The experimental results demonstrate that our model effectively transforms 2D images to their corresponding 3D representations and has the capability of learning rich relationship among them as compared to the traditional 3D Generative Adversarial Network (3D-GAN) and Deep Autoencoder (DAE).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Carlson, W.E.: An algorithm and data structure for 3D object synthesis using surface patch intersections. ACM SIGGRAPH Comput. Graph. 16(3), 255–263 (1982)

    Article  Google Scholar 

  2. Tangelder, J.W.H., Veltkamp, R.C.: A survey of content based 3D shape retrieval methods. In: Proceedings of Shape Modeling Applications. IEEE (2004)

    Google Scholar 

  3. Van Kaick, O., et al.: A survey on shape correspondence. Comput. Graph. Forum 30(6) (2011)

    Google Scholar 

  4. Chang, A.X., et al.: Shapenet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)

  5. Wu, Z., et al.: 3D shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  6. Girdhar, R., et al.: Learning a predictable and generative vector representation for objects. In: European Conference on Computer Vision. Springer, Cham (2016)

    Chapter  Google Scholar 

  7. Qi, C.R., et al.: Volumetric and multi-view CNNs for object classification on 3D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  8. Su, H., et al.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision (2015)

    Google Scholar 

  9. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (2014)

    Google Scholar 

  10. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)

  11. Sharma, A., Oliver, G., Fritz, M.: VConv-DAE: deep volumetric shape learning without object labels. In: European Conference on Computer Vision. Springer, Cham (2016)

    Google Scholar 

  12. Laurentini, A.: The visual hull concept for silhouette-based image understanding. IEEE Trans. Pattern Anal. Mach. Intell. 16(2), 150–162 (1994)

    Article  Google Scholar 

  13. Woodham, R.J.: Photometric method for determining surface orientation from multiple images. Opt. Eng. 19(1), 191139 (1980)

    Google Scholar 

  14. Furukawa, Y., Hernández, C.: Multi-view stereo: a tutorial. Found. Trends® Comput. Graph. Vis. 9(1-2), 1–148 (2015)

    Article  Google Scholar 

  15. Flynn, J., et al.: Deepstereo: learning to predict new views from the world’s imagery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  16. Ji, D., et al.: Deep view morphing. In: Computer Vision and Pattern Recognition (CVPR), vol. 2 (2017)

    Google Scholar 

  17. Garg, R., et al.: Unsupervised cnn for single view depth estimation: geometry to the rescue. In: European Conference on Computer Vision. Springer, Cham (2016)

    Chapter  Google Scholar 

  18. Dosovitskiy, A., Tobias Springenberg, J., Brox, T.: Learning to generate chairs with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  19. Wu, J., et al.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: Advances in Neural Information Processing Systems (2016)

    Google Scholar 

  20. Rezende, D.J., et al.: Unsupervised learning of 3D structure from images. In: Advances in Neural Information Processing Systems (2016)

    Google Scholar 

  21. Yan, X., et al.: Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision. In: Advances in Neural Information Processing Systems (2016)

    Google Scholar 

  22. Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings ICML, vol. 30, no. 1 (2013)

    Google Scholar 

  23. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  24. Kim, J., et al.: Octree-based obstacle representation and registration for real-time. In: ICMIT 2007: Mechatronics, MEMS, and Smart Materials, vol. 6794. International Society for Optics and Photonics (2008)

    Google Scholar 

  25. Baldi, P.: Autoencoders, unsupervised learning, and deep architectures. In: Proceedings of ICML Workshop on Unsupervised and Transfer Learning (2012)

    Google Scholar 

  26. Intelligent Systems Research Institute Database of Household Objects (ISRI_DB). http://isrc.skku.ac.kr/DB3D/db.php

Download references

Acknowledgement

Sukhan Lee proposed the concept of transformation of single 2D image to its corresponding typical 3D representation using 2D-3D Transformation Network, while Naeem Ul Islam implements the concept and carries out experimentation. This research was supported, in part by the “Robot Industry Fusion Core Technology Development Project” of KEIT (10048320), in part, by the “Project of e-Drive Train Platform Development for small and medium Commercial Electric Vehicles based on IoT Technology” of Korea Institute of Energy Technology Evaluation and Planning (KETEP) (20172010000420), sponsored by the Korea Ministry of Trade, Industry and Energy (MOTIE) and in part by the MSIP under the space technology development program (NRF-2016M1A3A9005563) supervised by the NRF.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Naeem Ul Islam or Sukhan Lee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ul Islam, N., Lee, S. (2019). Learning Typical 3D Representation from a Single 2D Correspondence Using 2D-3D Transformation Network. In: Lee, S., Ismail, R., Choo, H. (eds) Proceedings of the 13th International Conference on Ubiquitous Information Management and Communication (IMCOM) 2019. IMCOM 2019. Advances in Intelligent Systems and Computing, vol 935. Springer, Cham. https://doi.org/10.1007/978-3-030-19063-7_35

Download citation

Publish with us

Policies and ethics