Learning Typical 3D Representation from a Single 2D Correspondence Using 2D-3D Transformation Network

Ul Islam, Naeem; Lee, Sukhan

doi:10.1007/978-3-030-19063-7_35

Naeem Ul Islam¹⁷ &
Sukhan Lee¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 935))

Included in the following conference series:

International Conference on Ubiquitous Information Management and Communication

1332 Accesses
5 Citations

Abstract

Understanding 3D environment based on deep learning is the subject of interest in computer vision community due to its wide variety of applications. However, learning good 3D representation is a challenging problem due to many factors including high dimensionality of the data, modeling complexity in terms of symmetric objects, required computational costs and expected variations scale up by the order of 1.5 compared to learning 2D representations. In this paper we address the problem of typical 3D representation by transformation from its corresponding single 2D representation using Deep Autoencoder based Generate Adversarial Network termed as 2D-3D Transformation Network (2D-3D-TNET). The proposed model objective is based on traditional GAN loss along with the autoencoder loss, which allows the generator to effectively generate 3D objects corresponding to the input 2D images. Furthermore, instead of training the discriminator to discriminate real from generated image, we allow the discriminator to take both 2D image and 3D object as an input and learn to discriminate whether they are true correspondences or not. Thus, the discriminator learns and encodes the relationship between 3D objects and their corresponding projected 2D images. In addition, our model does not require labelled data for training and learns on unsupervised data. Experiments are conducted on ISRI_DB with real 3D daily-life objects as well as with the standard ModelNet40 dataset. The experimental results demonstrate that our model effectively transforms 2D images to their corresponding 3D representations and has the capability of learning rich relationship among them as compared to the traditional 3D Generative Adversarial Network (3D-GAN) and Deep Autoencoder (DAE).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Carlson, W.E.: An algorithm and data structure for 3D object synthesis using surface patch intersections. ACM SIGGRAPH Comput. Graph. 16(3), 255–263 (1982)
Article Google Scholar
Tangelder, J.W.H., Veltkamp, R.C.: A survey of content based 3D shape retrieval methods. In: Proceedings of Shape Modeling Applications. IEEE (2004)
Google Scholar
Van Kaick, O., et al.: A survey on shape correspondence. Comput. Graph. Forum 30(6) (2011)
Google Scholar
Chang, A.X., et al.: Shapenet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
Wu, Z., et al.: 3D shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
Girdhar, R., et al.: Learning a predictable and generative vector representation for objects. In: European Conference on Computer Vision. Springer, Cham (2016)
Chapter Google Scholar
Qi, C.R., et al.: Volumetric and multi-view CNNs for object classification on 3D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Su, H., et al.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision (2015)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (2014)
Google Scholar
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
Sharma, A., Oliver, G., Fritz, M.: VConv-DAE: deep volumetric shape learning without object labels. In: European Conference on Computer Vision. Springer, Cham (2016)
Google Scholar
Laurentini, A.: The visual hull concept for silhouette-based image understanding. IEEE Trans. Pattern Anal. Mach. Intell. 16(2), 150–162 (1994)
Article Google Scholar
Woodham, R.J.: Photometric method for determining surface orientation from multiple images. Opt. Eng. 19(1), 191139 (1980)
Google Scholar
Furukawa, Y., Hernández, C.: Multi-view stereo: a tutorial. Found. Trends® Comput. Graph. Vis. 9(1-2), 1–148 (2015)
Article Google Scholar
Flynn, J., et al.: Deepstereo: learning to predict new views from the world’s imagery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Ji, D., et al.: Deep view morphing. In: Computer Vision and Pattern Recognition (CVPR), vol. 2 (2017)
Google Scholar
Garg, R., et al.: Unsupervised cnn for single view depth estimation: geometry to the rescue. In: European Conference on Computer Vision. Springer, Cham (2016)
Chapter Google Scholar
Dosovitskiy, A., Tobias Springenberg, J., Brox, T.: Learning to generate chairs with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
Wu, J., et al.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: Advances in Neural Information Processing Systems (2016)
Google Scholar
Rezende, D.J., et al.: Unsupervised learning of 3D structure from images. In: Advances in Neural Information Processing Systems (2016)
Google Scholar
Yan, X., et al.: Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision. In: Advances in Neural Information Processing Systems (2016)
Google Scholar
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings ICML, vol. 30, no. 1 (2013)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kim, J., et al.: Octree-based obstacle representation and registration for real-time. In: ICMIT 2007: Mechatronics, MEMS, and Smart Materials, vol. 6794. International Society for Optics and Photonics (2008)
Google Scholar
Baldi, P.: Autoencoders, unsupervised learning, and deep architectures. In: Proceedings of ICML Workshop on Unsupervised and Transfer Learning (2012)
Google Scholar
Intelligent Systems Research Institute Database of Household Objects (ISRI_DB). http://isrc.skku.ac.kr/DB3D/db.php

Download references

Acknowledgement

Sukhan Lee proposed the concept of transformation of single 2D image to its corresponding typical 3D representation using 2D-3D Transformation Network, while Naeem Ul Islam implements the concept and carries out experimentation. This research was supported, in part by the “Robot Industry Fusion Core Technology Development Project” of KEIT (10048320), in part, by the “Project of e-Drive Train Platform Development for small and medium Commercial Electric Vehicles based on IoT Technology” of Korea Institute of Energy Technology Evaluation and Planning (KETEP) (20172010000420), sponsored by the Korea Ministry of Trade, Industry and Energy (MOTIE) and in part by the MSIP under the space technology development program (NRF-2016M1A3A9005563) supervised by the NRF.

Author information

Authors and Affiliations

Intelligent Systems Research Institute, College of Information and Communication Engineering, Sungkyunkwan University, Suwon, South Korea
Naeem Ul Islam & Sukhan Lee

Authors

Naeem Ul Islam
View author publications
You can also search for this author in PubMed Google Scholar
Sukhan Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Naeem Ul Islam or Sukhan Lee .

Editor information

Editors and Affiliations

Intelligent Systems Research Institute, Sungkyunkwan University, Suwon, Korea (Republic of)
Sukhan Lee
Software Engineering, Universiti Kuala Lumpur, Kuala Lumpur, Malaysia
Roslan Ismail
College of Software, Sungkyunkwan University, Suwon, Korea (Republic of)
Hyunseung Choo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ul Islam, N., Lee, S. (2019). Learning Typical 3D Representation from a Single 2D Correspondence Using 2D-3D Transformation Network. In: Lee, S., Ismail, R., Choo, H. (eds) Proceedings of the 13th International Conference on Ubiquitous Information Management and Communication (IMCOM) 2019. IMCOM 2019. Advances in Intelligent Systems and Computing, vol 935. Springer, Cham. https://doi.org/10.1007/978-3-030-19063-7_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-19063-7_35
Published: 23 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19062-0
Online ISBN: 978-3-030-19063-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics