Fast View Synthesis of Casual Videos with Soup-of-Planes

Lee, Yao-Chih; Zhang, Zhoutong; Blackburn-Matzen, Kevin; Niklaus, Simon; Zhang, Jianming; Huang, Jia-Bin; Liu, Feng

doi:10.1007/978-3-031-72920-1_16

Yao-Chih Lee^13,14,
Zhoutong Zhang¹⁵,
Kevin Blackburn-Matzen¹⁴,
Simon Niklaus¹⁴,
Jianming Zhang¹⁴,
Jia-Bin Huang¹³ &
…
Feng Liu¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15096))

Included in the following conference series:

European Conference on Computer Vision

289 Accesses

Abstract

Novel view synthesis from an in-the-wild video is difficult due to challenges like scene dynamics and lack of parallax. While existing methods have shown promising results with implicit neural radiance fields, they are slow to train and render. This paper revisits explicit video representations to synthesize high-quality novel views from a monocular video efficiently. We treat static and dynamic video content separately. Specifically, we build a global static scene model using an extended plane-based scene representation to synthesize temporally coherent novel video. Our plane-based scene representation is augmented with spherical harmonics and displacement maps to capture view-dependent effects and model non-planar complex surface geometries. We opt to represent the dynamic content as per-frame point clouds for efficiency. While such representations are inconsistency-prone, minor temporal inconsistencies are perceptually masked due to motion. We develop a method to quickly estimate such a hybrid video representation and render novel views in real time. Our experiments show that our method can render high-quality novel views from an in-the-wild video with comparable quality to state-of-the-art methods while being 100$\times $ faster in training and enabling real-time rendering. Project page at https://casual-fvs.github.io.

Y.-C. Lee–Work done while Yao-Chih was an intern at Adobe Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Neural Radiance Fields for Dynamic View Synthesis Using Local Temporal Priors

Temporal-MPI: Enabling Multi-plane Images for Dynamic Scene Modelling via Temporal Basis Learning

Depth Normalized Stable View Synthesis

Notes

1.
The training time does not include SfM preprocessing time (e.g., COLMAP [62] or video-depth-pose estimation) for all methods.

References

Aliev, K.A., Sevastopolsky, A., Kolos, M., Ulyanov, D., Lempitsky, V.: Neural point-based graphics. In: ECCV (2020)
Google Scholar
Attal, B., Huang, J.B., Richardt, C., Zollhoefer, M., Kopf, J., O’Toole, M., Kim, C.: HyperReel: high-fidelity 6-DoF video with ray-conditioned sampling. In: CVPR (2023)
Google Scholar
Bansal, A., Vo, M., Sheikh, Y., Ramanan, D., Narasimhan, S.: 4d visualization of dynamic events from unconstrained multi-view videos. In: CVPR (2020)
Google Scholar
Bansal, A., Zollhoefer, M.: Neural pixel composition for 3d-4d view synthesis from multi-views. In: CVPR (2023)
Google Scholar
Bemana, M., Myszkowski, K., Seidel, H.P., Ritschel, T.: X-fields: Implicit neural view-, light- and time-image interpolation. In: SIGGRAPH Asia (2020)
Google Scholar
Bian, W., Wang, Z., Li, K., Bian, J.W., Prisacariu, V.A.: Nope-nerf: optimising neural radiance field with no pose prior. In: CVPR (2023)
Google Scholar
Büsching, M., Bengtson, J., Nilsson, D., Björkman, M.: FlowIBR: leveraging pre-training for efficient neural image-based rendering of dynamic scenes. arXiv preprint arXiv:2309.05418 (2023)
Cao, A., Johnson, J.: HexPlane: a fast representation for dynamic scenes. In: CVPR (2023)
Google Scholar
Cao, A., Rockwell, C., Johnson, J.: FWD: real-time novel view synthesis with forward warping and depth. In: CVPR (2022)
Google Scholar
Das, D., Wewer, C., Yunus, R., Ilg, E., Lenssen, J.E.: Neural parametric gaussians for monocular non-rigid object reconstruction. arXiv preprint arXiv:2312.01196 (2023)
Flynn, J., et al.: DeepView: view synthesis with learned gradient descent. In: CVPR (2019)
Google Scholar
Fridovich-Keil, S., Meanti, G., Warburg, F.R., Recht, B., Kanazawa, A.: K-planes: explicit radiance fields in space, time, and appearance. In: CVPR (2023)
Google Scholar
Gao, C., Saraf, A., Kopf, J., Huang, J.B.: Dynamic view synthesis from dynamic monocular video. In: ICCV (2021)
Google Scholar
Gao, H., Li, R., Tulsiani, S., Russell, B., Kanazawa, A.: Monocular dynamic view synthesis: a reality check. In: NeurIPS (2022)
Google Scholar
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR (2017)
Google Scholar
Gortler, S.J., Grzeszczuk, R., Szeliski, R., Cohen, M.F.: The lumigraph. In: SIGGRAPH (1996)
Google Scholar
Han, Y., Wang, R., Yang, J.: Single-view view synthesis in the wild with learned adaptive multiplane images. In: SIGGRAPH (2022)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV (2017)
Google Scholar
Hu, R., Ravi, N., Berg, A.C., Pathak, D.: Worldsheet: wrapping the world in a 3d sheet for view synthesis from a single image. In: ICCV (2021)
Google Scholar
Huang, Y.H., Sun, Y.T., Yang, Z., Lyu, X., Cao, Y.P., Qi, X.: SC-GS: sparse-controlled gaussian splatting for editable dynamic scenes. arXiv preprint arXiv:2312.14937 (2023)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: ECCV (2016)
Google Scholar
Katsumata, K., Vo, D.M., Nakayama, H.: An efficient 3d gaussian representation for monocular/multi-view dynamic scenes. arXiv preprint arXiv:2311.12897 (2023)
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. In: ACM TOG (2023)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Kirillov, A., et al.: Segment anything. arXiv:2304.02643 (2023)
Kopf, J., et al.: One shot 3d photography. In: SIGGRAPH (2020)
Google Scholar
Kopf, J., Rong, X., Huang, J.B.: Robust consistent video depth estimation. In: CVPR (2021)
Google Scholar
Kratimenos, A., Lei, J., Daniilidis, K.: DynMF: neural motion factorization for real-time dynamic view synthesis with 3d gaussian splatting. arXiV (2023)
Google Scholar
Lee, Y.C., Tseng, K.W., Chen, Y.T., Chen, C.C., Chen, C.S., Hung, Y.P.: 3d video stabilization with depth estimation by CNN-based optimization. In: CVPR (2021)
Google Scholar
Levoy, M., Hanrahan, P.: Light field rendering. In: SIGGRAPH (1996)
Google Scholar
Li, T., et al.: Neural 3d video synthesis from multi-view video. In: CVPR (2022)
Google Scholar
Li, X., Cao, Z., Sun, H., Zhang, J., Xian, K., Lin, G.: 3d cinemagraphy from a single image. In: CVPR (2023)
Google Scholar
Li, Z., Chen, Z., Li, Z., Xu, Y.: Spacetime gaussian feature splatting for real-time dynamic view synthesis. arXiv preprint arXiv:2312.16812 (2023)
Li, Z., Niklaus, S., Snavely, N., Wang, O.: Neural scene flow fields for space-time view synthesis of dynamic scenes. In: CVPR (2021)
Google Scholar
Li, Z., Wang, Q., Cole, F., Tucker, R., Snavely, N.: DynIBaR: neural dynamic image-based rendering. In: CVPR (2023)
Google Scholar
Liang, Y., et al.: GauFRe: gaussian deformation fields for real-time dynamic novel view synthesis. arXiv preprint arXiv:2312.11458 (2023)
Lin, C.H., Ma, W.C., Torralba, A., Lucey, S.: Barf: bundle-adjusting neural radiance fields. In: ICCV (2021)
Google Scholar
Lin, H., et al.: High-fidelity and real-time novel view synthesis for dynamic scenes. In: SIGGRAPH Asia Conference Proceedings (2023)
Google Scholar
Lin, K.E., Xiao, L., Liu, F., Yang, G., Ramamoorthi, R.: Deep 3d mask volume for view synthesis of dynamic scenes. In: ICCV (2021)
Google Scholar
Lin, Y., Dai, Z., Zhu, S., Yao, Y.: Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle. arXiv:2312.03431 (2023)
Lin, Z.H., Ma, W.C., Hsu, H.Y., Wang, Y.C.F., Wang, S.: NeurMiPs: neural mixture of planar experts for view synthesis. In: CVPR (2022)
Google Scholar
Ling, S.Z., Sharp, N., Jacobson, A.: Vectoradam for rotation equivariant geometry optimization. In: NeurIPS (2022)
Google Scholar
Liu, Y.L., et al.: Robust dynamic radiance fields. In: CVPR (2023)
Google Scholar
Lu, E., Cole, F., Dekel, T., Zisserman, A., Freeman, W.T., Rubinstein, M.: Omnimatte: associating objects and their effects in video. In: CVPR (2021)
Google Scholar
Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3d gaussians: tracking by persistent dynamic view synthesis. In: 3DV (2024)
Google Scholar
Martin-Brualla, R., Radwan, N., Sajjadi, M.S., Barron, J.T., Dosovitskiy, A., Duckworth, D.: Nerf in the wild: neural radiance fields for unconstrained photo collections. In: CVPR (2021)
Google Scholar
Meuleman, A., et al.: Progressively optimized local radiance fields for robust view synthesis. In: CVPR (2023)
Google Scholar
Mildenhall, B., et al.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. In: ACM TOG (2019)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)
Google Scholar
Niklaus, S., Hu, P., Chen, J.: Splatting-based synthesis for video frame interpolation. In: WACV (2023)
Google Scholar
Niklaus, S., Liu, F.: Softmax splatting for video frame interpolation. In: IEEE Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Niklaus, S., Mai, L., Yang, J., Liu, F.: 3d ken burns effect from a single image. In: ACM TOG (2019)
Google Scholar
Park, K., et al.: Nerfies: deformable neural radiance fields. In: ICCV (2021)
Google Scholar
Park, K., et al.: HyperNeRF: a higher-dimensional representation for topologically varying neural radiance fields. In: ACM TOG (2021)
Google Scholar
Peng, J., Zhang, J., Luo, X., Lu, H., Xian, K., Cao, Z.: MPIB: an MPI-based bokeh rendering framework for realistic partial occlusion effects. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022, ECCV 2022, LNCS, vol. 13666, pp. 590–607. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20068-7_34
Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: CVPR (2016)
Google Scholar
Philip, J., Deschaintre, V.: Floaters no more: radiance field gradient scaling for improved near-camera training. In: Eurographics Symposium on Rendering (2023)
Google Scholar
Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-nerf: neural radiance fields for dynamic scenes. In: CVPR (2021)
Google Scholar
Ramamoorthi, R., Hanrahan, P.: An efficient representation for irradiance environment maps. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 497–500 (2001)
Google Scholar
Ren, Y., Zhang, T., Pollefeys, M., Süsstrunk, S., Wang, F.: VolRecon: volume rendering of signed ray distance functions for generalizable multi-view reconstruction. In: CVPR (2023)
Google Scholar
Rockwell, C., Fouhey, D.F., Johnson, J.: PixelSynth: generating a 3d-consistent experience from a single image. In: ICCV (2021)
Google Scholar
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)
Google Scholar
Shade, J., Gortler, S., He, L.W., Szeliski, R.: Layered depth images. In: Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques, pp. 231–242 (1998)
Google Scholar
Shih, M.L., Su, S.Y., Kopf, J., Huang, J.B.: 3d photography using context-aware layered depth inpainting. In: CVPR (2020)
Google Scholar
Sinha, S., Steedly, D., Szeliski, R.: Piecewise planar stereo for image-based rendering. In: ICCV (2009)
Google Scholar
Song, L., et al.: NeRFPlayer: a streamable dynamic scene representation with decomposed neural radiance fields. In: IEEE TVCG (2023)
Google Scholar
Srinivasan, P.P., Tucker, R., Barron, J.T., Ramamoorthi, R., Ng, R., Snavely, N.: Pushing the boundaries of view extrapolation with multiplane images. In: CVPR (2019)
Google Scholar
Stich, T., Linz, C., Albuquerque, G., Magnor, M.: View and time interpolation in image space. In: Computer Graphics Forum (2008)
Google Scholar
Suhail, M., Esteves, C., Sigal, L., Makadia, A.: Light field neural rendering. In: CVPR (2022)
Google Scholar
Teed, Z., Deng, J.: Raft: recurrent all-pairs field transforms for optical flow. In: ECCV (2020)
Google Scholar
Tian, F., Du, S., Duan, Y.: MonoNeRF: learning a generalizable dynamic radiance field from monocular videos. In: ICCV (2023)
Google Scholar
Tretschk, E., Tewari, A., Golyanik, V., Zollhöfer, M., Lassner, C., Theobalt, C.: Non-rigid neural radiance fields: reconstruction and novel view synthesis of a dynamic scene from monocular video. In: ICCV (2021)
Google Scholar
Tucker, R., Snavely, N.: Single-view view synthesis with multiplane images. In: CVPR (2020)
Google Scholar
Wang, P., Liu, L., Liu, Y., Theobalt, C., Komura, T., Wang, W.: Neus: learning neural implicit surfaces by volume rendering for multi-view reconstruction. In: NeurIPS (2021)
Google Scholar
Wang, Q., et al.: IBRNet: learning multi-view image-based rendering. In: CVPR (2021)
Google Scholar
Wang, Y., Han, Q., Habermann, M., Daniilidis, K., Theobalt, C., Liu, L.: Neus2: fast learning of neural implicit surfaces for multi-view reconstruction. In: ICCV (2023)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Google Scholar
Wiles, O., Gkioxari, G., Szeliski, R., Johnson, J.: SynSin: end-to-end view synthesis from a single image. In: CVPR (2020)
Google Scholar
Wizadwongsa, S., Phongthawee, P., Yenphraphai, J., Suwajanakorn, S.: NEX: real-time view synthesis with neural basis expansion. In: CVPR (2021)
Google Scholar
Wu, G., et al.: 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528 (2023)
Xian, W., Huang, J.B., Kopf, J., Kim, C.: Space-time neural irradiance fields for free-viewpoint video. In: CVPR (2021)
Google Scholar
Yang, Z., Yang, H., Pan, Z., Zhang, L.: Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting. In: ICLR (2024)
Google Scholar
Yang, Z., Gao, X., Zhou, W., Jiao, S., Zhang, Y., Jin, X.: Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101 (2023)
Yariv, L., et al.: BakedSDF: Meshing neural SDFs for real-time view synthesis. arXiv preprint arXiv:2302.14859 (2023)
Yoon, J.S., Kim, K., Gallo, O., Park, H.S., Kautz, J.: Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera. In: CVPR (2020)
Google Scholar
Zhang, M., Wang, J., Li, X., Huang, Y., Sato, Y., Lu, Y.: Structural multiplane image: bridging neural view synthesis and 3d reconstruction. In: CVPR (2023)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
Google Scholar
Zhang, Z., Cole, F., Li, Z., Rubinstein, M., Snavely, N., Freeman, W.T.: Structure and motion from casual videos. In: ECCV (2022)
Google Scholar
Zhang, Z., Cole, F., Tucker, R., Freeman, W.T., Dekel, T.: Consistent depth of moving objects in video. In: ACM TOG (2021)
Google Scholar
Zhao, X., Colburn, A., Ma, F., Bautista, M.A., Susskind, J.M., Schwing, A.G.: Pseudo-generalized dynamic view synthesis from a video. In: ICLR (2024)
Google Scholar
Zhou, T., Tucker, R., Flynn, J., Fyffe, G., Snavely, N.: Stereo magnification: Learning view synthesis using multiplane images. In: SIGGRAPH (2018)
Google Scholar
Zitnick, C.L., Kang, S.B., Uyttendaele, M., Winder, S., Szeliski, R.: High-quality video view interpolation using a layered representation. In: ACM TOG (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Maryland College Park, College Park, USA
Yao-Chih Lee & Jia-Bin Huang
Adobe Research, San Jose, USA
Yao-Chih Lee, Kevin Blackburn-Matzen, Simon Niklaus, Jianming Zhang & Feng Liu
Adobe, San Jose, USA
Zhoutong Zhang

Authors

Yao-Chih Lee
View author publications
You can also search for this author in PubMed Google Scholar
Zhoutong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Blackburn-Matzen
View author publications
You can also search for this author in PubMed Google Scholar
Simon Niklaus
View author publications
You can also search for this author in PubMed Google Scholar
Jianming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jia-Bin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Feng Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yao-Chih Lee .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 10273 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, YC. et al. (2025). Fast View Synthesis of Casual Videos with Soup-of-Planes. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15096. Springer, Cham. https://doi.org/10.1007/978-3-031-72920-1_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-72920-1_16
Published: 01 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72919-5
Online ISBN: 978-3-031-72920-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fast View Synthesis of Casual Videos with Soup-of-Planes