PolyOculus: Simultaneous Multi-view Image-Based Novel View Synthesis

Yu, Jason J.; Aumentado-Armstrong, Tristan; Forghani, Fereshteh; Derpanis, Konstantinos G.; Brubaker, Marcus A.

doi:10.1007/978-3-031-73036-8_25

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15121))

Included in the following conference series:

European Conference on Computer Vision

218 Accesses

Abstract

This paper considers the problem of generative novel view synthesis (GNVS), generating novel, plausible views of a scene given a limited number of known views. Here, we propose a set-based generative model that can simultaneously generate multiple, self-consistent new views, conditioned on any number of views. Our approach is not limited to generating a single image at a time and can condition on a variable number of views. As a result, when generating a large number of views, our method is not restricted to a low-order autoregressive generation approach and is better able to maintain generated image quality over large sets of images. We evaluate our model on standard NVS datasets and show that it outperforms the state-of-the-art image-based GNVS baselines. Further, we show that the model is capable of generating sets of views that have no natural sequential ordering, like loops and binocular trajectories, and significantly outperforms other methods on such tasks. Our project page is available at: https://yorkucvil.github.io/PolyOculus-NVS/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images

Multi-view to Novel View: Synthesizing Novel Views With Self-learned Confidence

View Synthesis by Appearance Flow

References

Anciukevicius, T., et al.: RenderDiffusion: image diffusion for 3D reconstruction, inpainting and generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Google Scholar
Aoki, Y., Goforth, H., Srivatsan, R.A., Lucey, S.: PointNetLK: robust & efficient point cloud registration using PointNet. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Avidan, S., Shashua, A.: Novel view synthesis in tensor space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (1997)
Google Scholar
Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Zip-NeRF: anti-aliased grid-based neural radiance fields. In: Proceedings of the International Conference on Computer Vision (ICCV) (2023)
Google Scholar
Bautista, M.Á., Guo, P., et al.: GAUDI: a neural architect for immersive 3D scene generation. Neural Inform. Process. Syst. (NeurIPS) (2022)
Google Scholar
Blattmann, A., Rombach, R., Ling, H., Dockhorn, T., Kim, S.W., Fidler, S., Kreis, K.: Align your latents: high-resolution video synthesis with latent diffusion models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Chan, E.R., et al.: GeNVS: generative novel view synthesis with 3D-aware diffusion models. In: Proceedings of the International Conference on Computer Vision (ICCV) (2023)
Google Scholar
Chan, S., Shum, H.Y., Ng, K.T.: Image-based rendering and synthesis. IEEE Signal Processing Magazine (2007)
Google Scholar
Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. In: Proceedings of the International Conference on 3D Vision (3DV) (2017)
Google Scholar
Chen, S.E., Williams, L.: View interpolation for image synthesis. In: Proceedings of SIGGRAPH (1993)
Google Scholar
Croitoru, F.A., Hondru, V., Ionescu, R.T., Shah, M.: Diffusion models in vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) (2023)
Google Scholar
Deitke, M., et al.: Objaverse-XL: A universe of 10M+ 3D objects. Neural Inform. Process. Syst. (NeurIPS) (2023)
Google Scholar
Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar
Flynn, J., et al.: DeepView: view synthesis with learned gradient descent. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Gao, K., Gao, Y., He, H., Lu, D., Xu, L., Li, J.: NeRF: Neural radiance field in 3D vision, a comprehensive review. arXiv preprint arXiv:2210.00379 (2022)
Geyer, M., Bar-Tal, O., Bagon, S., Dekel, T.: TokenFlow: consistent diffusion features for consistent video editing. In: Proceedings of the International Conference on Learning Representations (ICLR) (2024)
Google Scholar
He, C., Li, R., Li, S., Zhang, L.: Voxel set transformer: a set-to-set approach to 3D object detection from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Neural Inform. Process. Syst. (NeurIPS) (2017)
Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Neural Inform. Process. Syst. (NeurIPS) (2020)
Google Scholar
Ho, J., Salimans, T., Gritsenko, A., Chan, W., Norouzi, M., Fleet, D.J.: Video diffusion models. In: Proceedings of the International Conference on Learning Representations (ICLR) (2022)
Google Scholar
Kim, J., Yoo, J., Lee, J., Hong, S.: SetVAE: learning hierarchical composition for generative modeling of set-structured data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar
Kim, S.W., et al.: NeuralField-LDM: scene generation with hierarchical latent diffusion models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Google Scholar
Laveau, S., Faugeras, O.D.: 3-D scene representation as a collection of images. In: Proceedings of the International Conference on Pattern Recognition (ICPR) (1994)
Google Scholar
Lee, J., Lee, Y., Kim, J., Kosiorek, A., Choi, S., Teh, Y.W.: Set transformer: a framework for attention-based permutation-invariant neural networks. In: Proceedings of the International Conference on Machine Learning (ICML) (2019)
Google Scholar
Liu, A., Tucker, R., Jampani, V., Makadia, A., Snavely, N., Kanazawa, A.: Infinite nature: perpetual view generation of natural scenes from a single image. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Liu, R., Wu, R., Hoorick, B.V., Tokmakov, P., Zakharov, S., Vondrick, C.: Zero-1-to-3: zero-shot one image to 3D object. In: Proceedings of the International Conference on Computer Vision (ICCV) (2023)
Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the International Conference on Computer Vision (ICCV) (1999)
Google Scholar
Luo, S., Hu, W.: Diffusion probabilistic models for 3D point cloud generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Nie, W., Vahdat, A., Anandkumar, A.: Controllable and compositional generation with latent-space energy-based models. Neural Inform. Process. Syst. (NeurIPS) (2021)
Google Scholar
Po, R., et al.: State of the art on diffusion models for visual computing. arXiv preprint arXiv:2310.07204 (2023)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. Neural Inform. Process. Syst. (NeurIPS) (2017)
Google Scholar
Ravanbakhsh, S., Schneider, J., Poczos, B.: Deep learning with sets and point clouds. In: Proceedings of the International Conference on Learning Representations (ICLR) (2016)
Google Scholar
Ren, X., Wang, X.: Look outside the room: synthesizing a consistent long-term 3D scene video from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Google Scholar
Rombach, R., Esser, P., Ommer, B.: Geometry-free view synthesis: transformers and no 3D priors. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. LNCS, vol 9351. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Sajjadi, M.S.M., et al.: Scene representation transformer: Geometry-free novel view synthesis through set-latent scene representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Google Scholar
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Savva, M., et al.: Habitat: a platform for embodied AI research. In: Proceedings of the International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Scharstein, D.: Stereo vision for view synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (1996)
Google Scholar
Segol, N., Lipman, Y.: On universal equivariant set networks. In: Proceedings of the International Conference on Learning Representations (ICLR) (2020)
Google Scholar
Seitz, S.M., Dyer, C.R.: Physically-valid view synthesis by image interpolation. In: ICCV Workshop on Representation of Visual Scenes (1995)
Google Scholar
Shum, H., Kang, S.B.: Review of image-based rendering techniques. In: Visual Communications and Image Processing. SPIE (2000)
Google Scholar
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: Proceedings of the International Conference on Machine Learning (ICML) (2015)
Google Scholar
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: Proceedings of the International Conference on Learning Representations (ICLR) (2021)
Google Scholar
Szymanowicz, S., Rupprecht, C., Vedaldi, A.: Viewset diffusion: (0-)image-conditioned 3D generative models from 2D data. In: Proceedings of the International Conference on Computer Vision (ICCV) (2023)
Google Scholar
Tancik, M., et al.: Fourier features let networks learn high frequency functions in low dimensional domains. Neural Inform. Process. Syst. (NeurIPS) (2020)
Google Scholar
Tewari, A., et al.: Diffusion with forward models: solving stochastic inverse problems without direct supervision. Neural Inform. Process. Syst. (NeurIPS) (2023)
Google Scholar
Tseng, H.Y., Li, Q., Kim, C., Alsisan, S., Huang, J.B., Kopf, J.: Consistent view synthesis with pose-guided diffusion models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Google Scholar
Wang, Q., et al.: IBRNet: learning multi-view image-based rendering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar
Watson, D., Chan, W., Martin-Brualla, R., Ho, J., Tagliasacchi, A., Norouzi, M.: Novel view synthesis with diffusion models. In: Proceedings of the International Conference on Learning Representations (ICLR) (2023)
Google Scholar
Xie, Y., et al.: Neural fields in visual computing and beyond. Comput. Graph. Forum (2022)
Google Scholar
Yogamani, S., et al.: WoodScape: a multi-task, multi-camera fisheye dataset for autonomous driving. In: Proceedings of the International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Yu, A., Ye, V., Tancik, M., Kanazawa, A.: pixelNeRF: neural radiance fields from one or few images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar
Yu, J.J., Forghani, F., Derpanis, K.G., Brubaker, M.A.: Long-term photometric consistent novel view synthesis with diffusion models. In: Proceedings of the International Conference on Computer Vision (ICCV) (2023)
Google Scholar
Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R.R., Smola, A.J.: Deep sets. Neural Inform. Process. Syst. (NeurIPS) (2017)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Zhang, Y., Hare, J., Prugel-Bennett, A.: Deep set prediction networks. Neural Inform. Process. Syst. (NeurIPS) (2019)
Google Scholar
Zhang, Y., Hare, J., Prügel-Bennett, A.: FSPool: learning set representations with featurewise sort pooling. In: Proceedings of the International Conference on Learning Representations (ICLR) (2019)
Google Scholar
Zhang, Y., Zhang, D.W., Lacoste-Julien, S., Burghouts, G.J., Snoek, C.G.: Multiset-equivariant set prediction with approximate implicit differentiation. In: Proceedings of the International Conference on Learning Representations (ICLR) (2022)
Google Scholar
Zhou, T., Tucker, R., Flynn, J., Fyffe, G., Snavely, N.: Stereo magnification. ACM Trans. Graph. (2018)
Google Scholar

Download references

Acknowledgements

This work was completed with support from the Vector Institute, and was funded in part by the Canada First Research Excellence Fund (CFREF) for the Vision: Science to Applications (VISTA) program (M.A.B., K.G.D., T.T.A.A.), the NSERC Discovery Grant program (M.A.B., K.G.D.), and the NSERC Canada Graduate Scholarship Doctoral program (J.J.Y.).

Author information

Authors and Affiliations

York University, Toronto, Canada
Jason J. Yu, Tristan Aumentado-Armstrong, Fereshteh Forghani, Konstantinos G. Derpanis & Marcus A. Brubaker
Vector Institute for AI, Toronto, Canada
Jason J. Yu, Tristan Aumentado-Armstrong, Konstantinos G. Derpanis & Marcus A. Brubaker
Samsung AI Centre Toronto, Toronto, Canada
Konstantinos G. Derpanis

Authors

Jason J. Yu
View author publications
You can also search for this author in PubMed Google Scholar
Tristan Aumentado-Armstrong
View author publications
You can also search for this author in PubMed Google Scholar
Fereshteh Forghani
View author publications
You can also search for this author in PubMed Google Scholar
Konstantinos G. Derpanis
View author publications
You can also search for this author in PubMed Google Scholar
Marcus A. Brubaker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jason J. Yu .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 83188 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, J.J., Aumentado-Armstrong, T., Forghani, F., Derpanis, K.G., Brubaker, M.A. (2025). PolyOculus: Simultaneous Multi-view Image-Based Novel View Synthesis. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15121. Springer, Cham. https://doi.org/10.1007/978-3-031-73036-8_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-73036-8_25
Published: 21 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73035-1
Online ISBN: 978-3-031-73036-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics