Latent transformations neural network for object view synthesis

Kim, Sangpil; Winovich, Nick; Chi, Hyung-Gun; Lin, Guang; Ramani, Karthik

doi:10.1007/s00371-019-01755-x

Latent transformations neural network for object view synthesis

Original Article
Published: 19 October 2019

Volume 36, pages 1663–1677, (2020)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Sangpil Kim¹,
Nick Winovich¹,
Hyung-Gun Chi¹,
Guang Lin¹ &
…
Karthik Ramani ORCID: orcid.org/0000-0001-8639-5135¹

541 Accesses
4 Citations
6 Altmetric
Explore all metrics

Abstract

We propose a fully convolutional conditional generative neural network, the latent transformation neural network, capable of rigid and non-rigid object view synthesis using a lightweight architecture suited for real-time applications and embedded systems. In contrast to existing object view synthesis methods which incorporate conditioning information via concatenation, we introduce a dedicated network component, the conditional transformation unit. This unit is designed to learn the latent space transformations corresponding to specified target views. In addition, a consistency loss term is defined to guide the network toward learning the desired latent space mappings, a task-divided decoder is constructed to refine the quality of generated views of objects, and an adaptive discriminator is introduced to improve the adversarial training process. The generalizability of the proposed methodology is demonstrated on a collection of three diverse tasks: multi-view synthesis on real hand depth images, view synthesis of real and synthetic faces, and the rotation of rigid objects. The proposed model is shown to be comparable with the state-of-the-art methods in structural similarity index measure and \(L_{1}\) metrics while simultaneously achieving a 24% reduction in the compute time for inference of novel images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

3D Face Reconstruction in Deep Learning Era: A Survey

Article 10 January 2022

References

Antipov, G., Baccouche, M., Dugelay, J.L.: Face aging with conditional generative adversarial networks (2017). arXiv preprint arXiv:1702.01983
Bao, J., Chen, D., Wen, F., Li, H., Hua, G.: CVAE-GAN: fine-grained image generation through asymmetric training (2017). arXiv preprint arXiv:1703.10155
Chang, A., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., et al.: An information-rich 3D model repository. 1(7), 8 (2015). arXiv preprint arXiv:1512.03012
Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., Yu, F.: ShapeNet: an information-rich 3D model repository. Technical Report, Stanford University—Princeton University—Toyota Technological Institute at Chicago (2015). arXiv:1512.03012 [cs.GR]
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2172–2180 (2016)
Choi, C., Kim, S., Ramani, K.: Learning hand articulations by hallucinating heat distribution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3104–3113 (2017)
Dinerstein, J., Egbert, P.K., Cline, D.: Enhancing computer graphics through machine learning: a survey. Vis. Comput. 23(1), 25–43 (2007)
Article Google Scholar
Dosovitskiy, A., Tobias Springenberg, J., Brox, T.: Learning to generate chairs with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1538–1546 (2015)
Fransens, R., Strecha, C., Van Gool, L.: Parametric stereo for multi-pose face recognition and 3D-face modeling. In: International Workshop on Analysis and Modeling of Faces and Gestures, pp. 109–124. Springer (2005)
Galama, Y., Mensink, T.: Iterative GANs for rotating visual objects (2018)
Gauthier, J.: Conditional generative adversarial nets for convolutional face generation. In: Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition. Winter Semester 2014(5), 2 (2014)
Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3D hand pose estimation in single depth images: from single-view CNN to multi-view CNNs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3593–3601 (2016)
Goodfellow, I.J.: NIPS 2016 Tutorial: Generative Adversarial Networks (2017). CoRR arXiv:1701.00160
Guan, H., Chang, J.S., Chen, L., Feris, R.S., Turk, M.: Multi-view appearance-based 3D hand pose estimation. In: 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW’06), pp. 154–154. IEEE (2006)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks (2016). arXiv preprint arXiv:1608.06993
IEEE: A 3D Face Model for Pose and Illumination Invariant Face Recognition (2009)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
Jason, J.Y., Harley, A.W., Derpanis, K.G.: Back to basics: unsupervised learning of optical flow via brightness constancy and motion smoothness. In: Computer Vision—ECCV 2016 Workshops, pp. 3–10. Springer (2016)
Jia, X., De Brabandere, B., Tuytelaars, T., Gool, L.V.: Dynamic filter networks. In: Advances in Neural Information Processing Systems, pp. 667–675 (2016)
Kim, S., Kim, D., Choi, S.: Citycraft: 3D virtual city creation from a single image. Vis. Comput. (2019). https://doi.org/10.1007/s00371-019-01701-x
Article Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization (2014). arXiv preprint arXiv:1412.6980
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes (2013). arXiv preprint arXiv:1312.6114
Kulkarni, T.D., Whitney, W.F., Kohli, P., Tenenbaum, J.: Deep convolutional inverse graphics network. In: Advances in Neural Information Processing Systems, pp. 2539–2547 (2015)
Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders (2015). arXiv preprint arXiv:1511.05644
Mirza, M., Osindero, S.: Conditional generative adversarial nets (2014). arXiv preprint arXiv:1411.1784
Miyato, T., Koyama, M.: cGANs with projection discriminator (2018). arXiv preprint arXiv:1802.05637
Nirkin, Y., Masi, I., Tuan, A.T., Hassner, T., Medioni, G.: On face segmentation, face swapping, and face perception. In: 2018 13th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2018), pp. 98–105. IEEE (2018)
Oberweger, M., Lepetit, V.: Deepprior++: improving fast and accurate 3D hand pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 585–594 (2017)
Park, E., Yang, J., Yumer, E., Ceylan, D., Berg, A.C.: Transformation-grounded image generation network for novel 3D view synthesis. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 702–711. IEEE (2017)
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: ENet: a deep neural network architecture for real-time semantic segmentation (2016). arXiv preprint arXiv:1606.02147
Ramachandran, P., Zoph, B., Le, Q.V.: Swish: a self-gated activation function (2017). arXiv preprint arXiv:1710.05941
Reed, S., Sohn, K., Zhang, Y., Lee, H.: Learning to disentangle factors of variation with manifold interaction. In: International Conference on Machine Learning, pp. 1431–1439 (2014)
Rezende, D.J., Eslami, S.A., Mohamed, S., Battaglia, P., Jaderberg, M., Heess, N.: Unsupervised learning of 3D structure from images. In: Advances in Neural Information Processing Systems, pp. 4996–5004 (2016)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer (2015)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)
Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: Advances in Neural Information Processing Systems, pp. 3483–3491 (2015)
Sun, S.H., Huh, M., Liao, Y.H., Zhang, N., Lim, J.J.: Multi-view to novel view: Synthesizing novel views with self-learned confidence. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 155–171 (2018)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Tatarchenko, M., Dosovitskiy, A., Brox, T.: Multi-view 3D models from single images with a convolutional network. In: European Conference on Computer Vision, pp. 322–337. Springer (2016)
Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Gr. 33(5), 169 (2014)
Article Google Scholar
Varley, J., DeChant, C., Richardson, A., Ruales, J., Allen, P.: Shape completion enabled robotic grasping. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2442–2447. IEEE (2017)
Wang, Q., Artières, T., Chen, M., Denoyer, L.: Adversarial learning for modeling human motion. Vis. Comput. (2018). https://doi.org/10.1007/s00371-018-1594-7
Article Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Yan, X., Yang, J., Sohn, K., Lee, H.: Attribute2image: conditional image generation from visual attributes. In: European Conference on Computer Vision, pp. 776–791. Springer (2016)
Zhang, S., Han, Z., Lai, Y.K., Zwicker, M., Zhang, H.: Stylistic scene enhancement GAN: mixed stylistic enhancement generation for 3D indoor scenes. Vis. Comput. 35(6–8), 1157–1169 (2019)
Article Google Scholar
Zhang, Z., Song, Y., Qi, H.: Age progression/regression by conditional adversarial autoencoder. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5810–5818 (2017)
Zhou, T., Tulsiani, S., Sun, W., Malik, J., Efros, A.A.: View synthesis by appearance flow. In: European Conference on Computer Vision, pp. 286–301. Springer (2016)

Download references

Acknowledgements

Karthik Ramani acknowledges the US National Science Foundation Awards NRI-1637961 and IIP-1632154. Guang Lin acknowledges the US National Science Foundation Awards DMS-1555072, DMS-1736364 and DMS-1821233. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agency. We gratefully appreciate the support of NVIDIA Corporation with the donation of GPUs used for this research.

Author information

Authors and Affiliations

Purdue University, West Lafayette, IN, 47906, USA
Sangpil Kim, Nick Winovich, Hyung-Gun Chi, Guang Lin & Karthik Ramani

Authors

Sangpil Kim
View author publications
You can also search for this author in PubMed Google Scholar
Nick Winovich
View author publications
You can also search for this author in PubMed Google Scholar
Hyung-Gun Chi
View author publications
You can also search for this author in PubMed Google Scholar
Guang Lin
View author publications
You can also search for this author in PubMed Google Scholar
Karthik Ramani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karthik Ramani.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, S., Winovich, N., Chi, HG. et al. Latent transformations neural network for object view synthesis. Vis Comput 36, 1663–1677 (2020). https://doi.org/10.1007/s00371-019-01755-x

Download citation

Published: 19 October 2019
Issue Date: August 2020
DOI: https://doi.org/10.1007/s00371-019-01755-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Latent transformations neural network for object view synthesis

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

3D Face Reconstruction in Deep Learning Era: A Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Latent transformations neural network for object view synthesis

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

3D Face Reconstruction in Deep Learning Era: A Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation