Abstract
This paper considers the problem of generative novel view synthesis (GNVS), generating novel, plausible views of a scene given a limited number of known views. Here, we propose a set-based generative model that can simultaneously generate multiple, self-consistent new views, conditioned on any number of views. Our approach is not limited to generating a single image at a time and can condition on a variable number of views. As a result, when generating a large number of views, our method is not restricted to a low-order autoregressive generation approach and is better able to maintain generated image quality over large sets of images. We evaluate our model on standard NVS datasets and show that it outperforms the state-of-the-art image-based GNVS baselines. Further, we show that the model is capable of generating sets of views that have no natural sequential ordering, like loops and binocular trajectories, and significantly outperforms other methods on such tasks. Our project page is available at: https://yorkucvil.github.io/PolyOculus-NVS/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Anciukevicius, T., et al.: RenderDiffusion: image diffusion for 3D reconstruction, inpainting and generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Aoki, Y., Goforth, H., Srivatsan, R.A., Lucey, S.: PointNetLK: robust & efficient point cloud registration using PointNet. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Avidan, S., Shashua, A.: Novel view synthesis in tensor space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (1997)
Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Zip-NeRF: anti-aliased grid-based neural radiance fields. In: Proceedings of the International Conference on Computer Vision (ICCV) (2023)
Bautista, M.Á., Guo, P., et al.: GAUDI: a neural architect for immersive 3D scene generation. Neural Inform. Process. Syst. (NeurIPS) (2022)
Blattmann, A., Rombach, R., Ling, H., Dockhorn, T., Kim, S.W., Fidler, S., Kreis, K.: Align your latents: high-resolution video synthesis with latent diffusion models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Chan, E.R., et al.: GeNVS: generative novel view synthesis with 3D-aware diffusion models. In: Proceedings of the International Conference on Computer Vision (ICCV) (2023)
Chan, S., Shum, H.Y., Ng, K.T.: Image-based rendering and synthesis. IEEE Signal Processing Magazine (2007)
Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. In: Proceedings of the International Conference on 3D Vision (3DV) (2017)
Chen, S.E., Williams, L.: View interpolation for image synthesis. In: Proceedings of SIGGRAPH (1993)
Croitoru, F.A., Hondru, V., Ionescu, R.T., Shah, M.: Diffusion models in vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) (2023)
Deitke, M., et al.: Objaverse-XL: A universe of 10M+ 3D objects. Neural Inform. Process. Syst. (NeurIPS) (2023)
Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Flynn, J., et al.: DeepView: view synthesis with learned gradient descent. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Gao, K., Gao, Y., He, H., Lu, D., Xu, L., Li, J.: NeRF: Neural radiance field in 3D vision, a comprehensive review. arXiv preprint arXiv:2210.00379 (2022)
Geyer, M., Bar-Tal, O., Bagon, S., Dekel, T.: TokenFlow: consistent diffusion features for consistent video editing. In: Proceedings of the International Conference on Learning Representations (ICLR) (2024)
He, C., Li, R., Li, S., Zhang, L.: Voxel set transformer: a set-to-set approach to 3D object detection from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Neural Inform. Process. Syst. (NeurIPS) (2017)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Neural Inform. Process. Syst. (NeurIPS) (2020)
Ho, J., Salimans, T., Gritsenko, A., Chan, W., Norouzi, M., Fleet, D.J.: Video diffusion models. In: Proceedings of the International Conference on Learning Representations (ICLR) (2022)
Kim, J., Yoo, J., Lee, J., Hong, S.: SetVAE: learning hierarchical composition for generative modeling of set-structured data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Kim, S.W., et al.: NeuralField-LDM: scene generation with hierarchical latent diffusion models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Laveau, S., Faugeras, O.D.: 3-D scene representation as a collection of images. In: Proceedings of the International Conference on Pattern Recognition (ICPR) (1994)
Lee, J., Lee, Y., Kim, J., Kosiorek, A., Choi, S., Teh, Y.W.: Set transformer: a framework for attention-based permutation-invariant neural networks. In: Proceedings of the International Conference on Machine Learning (ICML) (2019)
Liu, A., Tucker, R., Jampani, V., Makadia, A., Snavely, N., Kanazawa, A.: Infinite nature: perpetual view generation of natural scenes from a single image. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)
Liu, R., Wu, R., Hoorick, B.V., Tokmakov, P., Zakharov, S., Vondrick, C.: Zero-1-to-3: zero-shot one image to 3D object. In: Proceedings of the International Conference on Computer Vision (ICCV) (2023)
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the International Conference on Computer Vision (ICCV) (1999)
Luo, S., Hu, W.: Diffusion probabilistic models for 3D point cloud generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Nie, W., Vahdat, A., Anandkumar, A.: Controllable and compositional generation with latent-space energy-based models. Neural Inform. Process. Syst. (NeurIPS) (2021)
Po, R., et al.: State of the art on diffusion models for visual computing. arXiv preprint arXiv:2310.07204 (2023)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. Neural Inform. Process. Syst. (NeurIPS) (2017)
Ravanbakhsh, S., Schneider, J., Poczos, B.: Deep learning with sets and point clouds. In: Proceedings of the International Conference on Learning Representations (ICLR) (2016)
Ren, X., Wang, X.: Look outside the room: synthesizing a consistent long-term 3D scene video from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Rombach, R., Esser, P., Ommer, B.: Geometry-free view synthesis: transformers and no 3D priors. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. LNCS, vol 9351. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Sajjadi, M.S.M., et al.: Scene representation transformer: Geometry-free novel view synthesis through set-latent scene representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Savva, M., et al.: Habitat: a platform for embodied AI research. In: Proceedings of the International Conference on Computer Vision (ICCV) (2019)
Scharstein, D.: Stereo vision for view synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (1996)
Segol, N., Lipman, Y.: On universal equivariant set networks. In: Proceedings of the International Conference on Learning Representations (ICLR) (2020)
Seitz, S.M., Dyer, C.R.: Physically-valid view synthesis by image interpolation. In: ICCV Workshop on Representation of Visual Scenes (1995)
Shum, H., Kang, S.B.: Review of image-based rendering techniques. In: Visual Communications and Image Processing. SPIE (2000)
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: Proceedings of the International Conference on Machine Learning (ICML) (2015)
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: Proceedings of the International Conference on Learning Representations (ICLR) (2021)
Szymanowicz, S., Rupprecht, C., Vedaldi, A.: Viewset diffusion: (0-)image-conditioned 3D generative models from 2D data. In: Proceedings of the International Conference on Computer Vision (ICCV) (2023)
Tancik, M., et al.: Fourier features let networks learn high frequency functions in low dimensional domains. Neural Inform. Process. Syst. (NeurIPS) (2020)
Tewari, A., et al.: Diffusion with forward models: solving stochastic inverse problems without direct supervision. Neural Inform. Process. Syst. (NeurIPS) (2023)
Tseng, H.Y., Li, Q., Kim, C., Alsisan, S., Huang, J.B., Kopf, J.: Consistent view synthesis with pose-guided diffusion models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
Wang, Q., et al.: IBRNet: learning multi-view image-based rendering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Watson, D., Chan, W., Martin-Brualla, R., Ho, J., Tagliasacchi, A., Norouzi, M.: Novel view synthesis with diffusion models. In: Proceedings of the International Conference on Learning Representations (ICLR) (2023)
Xie, Y., et al.: Neural fields in visual computing and beyond. Comput. Graph. Forum (2022)
Yogamani, S., et al.: WoodScape: a multi-task, multi-camera fisheye dataset for autonomous driving. In: Proceedings of the International Conference on Computer Vision (ICCV) (2019)
Yu, A., Ye, V., Tancik, M., Kanazawa, A.: pixelNeRF: neural radiance fields from one or few images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Yu, J.J., Forghani, F., Derpanis, K.G., Brubaker, M.A.: Long-term photometric consistent novel view synthesis with diffusion models. In: Proceedings of the International Conference on Computer Vision (ICCV) (2023)
Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R.R., Smola, A.J.: Deep sets. Neural Inform. Process. Syst. (NeurIPS) (2017)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Zhang, Y., Hare, J., Prugel-Bennett, A.: Deep set prediction networks. Neural Inform. Process. Syst. (NeurIPS) (2019)
Zhang, Y., Hare, J., Prügel-Bennett, A.: FSPool: learning set representations with featurewise sort pooling. In: Proceedings of the International Conference on Learning Representations (ICLR) (2019)
Zhang, Y., Zhang, D.W., Lacoste-Julien, S., Burghouts, G.J., Snoek, C.G.: Multiset-equivariant set prediction with approximate implicit differentiation. In: Proceedings of the International Conference on Learning Representations (ICLR) (2022)
Zhou, T., Tucker, R., Flynn, J., Fyffe, G., Snavely, N.: Stereo magnification. ACM Trans. Graph. (2018)
Acknowledgements
This work was completed with support from the Vector Institute, and was funded in part by the Canada First Research Excellence Fund (CFREF) for the Vision: Science to Applications (VISTA) program (M.A.B., K.G.D., T.T.A.A.), the NSERC Discovery Grant program (M.A.B., K.G.D.), and the NSERC Canada Graduate Scholarship Doctoral program (J.J.Y.).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yu, J.J., Aumentado-Armstrong, T., Forghani, F., Derpanis, K.G., Brubaker, M.A. (2025). PolyOculus: Simultaneous Multi-view Image-Based Novel View Synthesis. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15121. Springer, Cham. https://doi.org/10.1007/978-3-031-73036-8_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-73036-8_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73035-1
Online ISBN: 978-3-031-73036-8
eBook Packages: Computer ScienceComputer Science (R0)