CARFF: Conditional Auto-Encoded Radiance Field for 3D Scene Forecasting

Yang, Jiezhi; Desai, Khushi; Packer, Charles; Bhatia, Harshil; Rhinehart, Nicholas; McAllister, Rowan; Gonzalez, Joseph E.

doi:10.1007/978-3-031-73024-5_14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15147))

Included in the following conference series:

European Conference on Computer Vision

229 Accesses

Abstract

We propose CARFF: Conditional Auto-encoded Radiance Field for 3D Scene Forecasting, a method for predicting future 3D scenes given past observations. Our method maps 2D ego-centric images to a distribution over plausible 3D latent scene configurations and predicts the evolution of hypothesized scenes through time. Our latents condition a global Neural Radiance Field (NeRF) to represent a 3D scene model, enabling explainable predictions and straightforward downstream planning. This approach models the world as a POMDP and considers complex scenarios of uncertainty in environmental states and dynamics. Specifically, we employ a two-stage training of Pose-Conditional-VAE and NeRF to learn 3D representations, and auto-regressively predict latent scene representations utilizing a mixture density network. We demonstrate the utility of our method in scenarios using the CARLA driving simulator, where CARFF enables efficient trajectory and contingency planning in complex multi-agent autonomous driving scenarios involving occlusions. Video and code are available at: www.carff.website.

J. Yang and K. Desai—Core contributors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Conditional-Flow NeRF: Accurate 3D Modelling with Reliable Uncertainty Quantification

Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D

CASPFormer: Trajectory Prediction from BEV Images with Deformable Attention

References

Adamkiewicz, M., et al.: Vision-only robot navigation in a neural radiance world. IEEE Rob. Autom. Lett. 7(2), 4606–4613 (2022)
Article Google Scholar
Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-nerf: a multiscale representation for anti-aliasing neural radiance fields. In: International Conference on Computer Vision, pp. 5855–5864 (2021)
Google Scholar
Cao, A., Johnson, J.: Hexplane: Dfor dynamic scenes. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 130–141 (2023)
Google Scholar
Cao, J., Wang, X., Darrell, T., Yu, F.: Instance-aware predictive navigation in multi-agent environments. In: IEEE International Conference on Robotics and Automation, pp. 5096–5102. IEEE (2021)
Google Scholar
Charatan, D., Li, S., Tagliasacchi, A., Sitzmann, V.: pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction (2023)
Google Scholar
Codevilla, F., Santana, E., López, A.M., Gaidon, A.: Exploring the limitations of behavior cloning for autonomous driving. In: International Conference on Computer Vision, pp. 9329–9338 (2019)
Google Scholar
Deng, K., Liu, A., Zhu, J.Y., Ramanan, D.: Depth-supervised nerf: fewer views and faster training for free. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 12882–12891 (2022)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16$\times $16 words: transformers for image recognition at scale. In: International Conference on Learning Representation (2021)
Google Scholar
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: Conference on Robot Learning, pp. 1–16 (2017)
Google Scholar
Driess, D., Huang, Z., Li, Y., Tedrake, R., Toussaint, M.: Learning multi-object dynamics with compositional neural radiance fields. arXiv preprint arXiv:2202.11855 (2022)
Fei, B., Xu, J., Zhang, R., Zhou, Q., Yang, W., He, Y.: 3d gaussian as a new vision era: a survey (2024)
Google Scholar
Fridovich-Keil, S., Yu, A., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: radiance fields without neural networks. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 5501–5510 (2022)
Google Scholar
Hausknecht, M., Stone, P.: Deep recurrent q-learning for partially observable mdps. In: AAAI (2015)
Google Scholar
Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)
Article Google Scholar
Ichnowski, J., Avigal, Y., Kerr, J., Goldberg, K.: Dex-nerf: using a neural radiance field to grasp transparent objects. arXiv preprint arXiv:2110.14217 (2021)
Ivanovic, B., Elhafsi, A., Rosman, G., Gaidon, A., Pavone, M.: Mats: an interpretable trajectory forecasting representation for planning and control. In: Conference on Robot Learning (2021)
Google Scholar
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1), 99–134 (1998). https://doi.org/10.1016/S0004-3702(98)00023-X. https://www.sciencedirect.com/science/article/pii/S000437029800023X
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4) (2023). https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
Kerr, J., et al.: Evo-nerf: evolving nerf for sequential robot grasping of transparent objects. In: Conference on Robot Learning (2022)
Google Scholar
Kosiorek, A.R., et al.: NeRF-VAE: a geometry aware 3d scene generative model. In: ICML, pp. 5742–5752 (2021)
Google Scholar
Li, Y., Li, S., Sitzmann, V., Agrawal, P., Torralba, A.: 3d neural scene representations for visuomotor control. In: Conference on Robot Learning, pp. 112–123 (2022)
Google Scholar
Liu, J.W., et al.: Devrf: fast deformable voxel radiance fields for dynamic scenes. Adv. Neural Inform. Process. Syst. 35, 36762–36775 (2022)
Google Scholar
Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3d gaussians: tracking by persistent dynamic view synthesis (2023)
Google Scholar
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008). http://www.jmlr.org/papers/v9/vandermaaten08a.html
Marza, P., Matignon, L., Simonin, O., Wolf, C.: Multi-object navigation with dynamically learned neural implicit representations. In: International Conference on Computer Vision, pp. 11004–11015 (2023)
Google Scholar
McAllister, R., Rasmussen, C.E.: Data-efficient reinforcement learning in continuous state-action gaussian-pomdps. Adv. Neural Inform. Process. Syst. (2017)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Chapter Google Scholar
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. 41(4), 1–15 (2022)
Article Google Scholar
Packer, C., et al.: Is anyone there? learning a planner contingent on perceptual uncertainty. In: Conference on Robot Learning (2022)
Google Scholar
Packer, C., et al.: Is anyone there? learning a planner contingent on perceptual uncertainty. In: Liu, K., Kulic, D., Ichnowski, J. (eds.) Conference on Robot Learning, pp. 1607–1617 (2023)
Google Scholar
Pan, X., You, Y., Wang, Z., Lu, C.: Virtual to real reinforcement learning for autonomous driving. arXiv preprint arXiv:1704.03952 (2017)
Papadimitriou, C.H., Tsitsiklis, J.N.: The complexity of markov decision processes. Math. Oper. Res. 12(3), 441–450 (1987)
Article MathSciNet Google Scholar
Park, K., et al.: Nerfies: deformable neural radiance fields. In: International Conference on Computer Vision, pp. 5865–5874 (2021)
Google Scholar
Park, K., et al.: Hypernerf: a higher-dimensional representation for topologically varying neural radiance fields. ACM Trans. Graph (2021)
Google Scholar
Pineau, J., Gordon, G., Thrun, S., et al.: Point-based value iteration: an anytime algorithm for pomdps. In: IJCAI, vol. 3, pp. 1025–1032 (2003)
Google Scholar
Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-NeRF: neural radiance fields for dynamic scenes. In: IEEE Conference on Computer Vision Pattern Recognition (2020)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Rhinehart, N., et al.: Contingencies from observations: tractable contingency planning with learned behavior models. In: IEEE International Conference on Robotics and Automation, pp. 13663–13669 (2021)
Google Scholar
Roessle, B., Barron, J.T., Mildenhall, B., Srinivasan, P.P., Nießner, M.: Dense depth priors for neural radiance fields from sparse input views. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 12892–12901 (2022)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge (2015)
Google Scholar
Shao, R., Zheng, Z., Tu, H., Liu, B., Zhang, H., Liu, Y.: Tensor4d: efficient neural 4d decomposition for high-fidelity dynamic reconstruction and rendering. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 16632–16642 (2023)
Google Scholar
Sun, C., Sun, M., Chen, H.T.: Direct voxel grid optimization: super-fast convergence for radiance fields reconstruction. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 5459–5469 (2022)
Google Scholar
Tancik, M., et al.: Block-nerf: scalable large scene neural view synthesis. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 8248–8258 (2022)
Google Scholar
Tancik, M., et al.: Fourier features let networks learn high frequency functions in low dimensional domains. Adv. Neural Inform. Process. Syst., 7537–7547 (2020)
Google Scholar
Tas, O.S., Stiller, C.: Limited visibility and uncertainty aware motion planning for automated driving. In: IEEE Intelligent Vehicles Symposium (IV) (2018)
Google Scholar
Tewari, A., et al.: Advances in neural rendering. Comput. Graph. Forum (2022)
Google Scholar
Toromanoff, M., Wirbel, E., Moutarde, F.: End-to-end model-free reinforcement learning for urban driving using implicit affordances. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 7153–7162 (2020)
Google Scholar
Tretschk, E., Golyanik, V., Zollhoefer, M., Bozic, A., Lassner, C., Theobalt, C.: Scenerflow: time-consistent reconstruction of general dynamic scenes. In: International Conference on 3D Vision (3DV) (2023)
Google Scholar
Truong, P., Rakotosaona, M.J., Manhardt, F., Tombari, F.: Sparf: neural radiance fields from sparse and noisy poses. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 4190–4200 (2023)
Google Scholar
Turki, H., Ramanan, D., Satyanarayanan, M.: Mega-nerf: scalable construction of large-scale nerfs for virtual fly-throughs. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 12922–12931 (2022)
Google Scholar
Wang, Q., et al.: Ibrnet: learning multi-view image-based rendering. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 4690–4699 (2021)
Google Scholar
Xu, Q., et al.: Point-nerf: point-based neural radiance fields. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 5438–5448 (2022)
Google Scholar
Yang, G.: VUER: a 3D visualization and data collection environment for robot learning (2024). https://github.com/vuer-ai/vuer
Yen-Chen, L., Florence, P., Barron, J.T., Rodriguez, A., Isola, P., Lin, T.Y.: inerf: inverting neural radiance fields for pose estimation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1323–1330. IEEE (2021)
Google Scholar
Yu, A., Ye, V., Tancik, M., Kanazawa, A.: pixelnerf: neural radiance fields from one or few images (2021)
Google Scholar
Yu, M.Y., Vasudevan, R., Johnson-Roberson, M.: Occlusion-aware risk assessment for autonomous driving in urban environments. IEEE Rob. Autom. Lett. 4(2), 2235–2241 (2019). https://doi.org/10.1109/lra.2019.2900453
Article Google Scholar
Zhang, C., Steinhauser, F., Hinz, G., Knoll, A.: Improved occlusion scenario coverage with a pomdp-based behavior planner for autonomous urban driving. In: 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), pp. 593–600 (2021). https://doi.org/10.1109/ITSC48978.2021.9564424
Zhang, C., Steinhauser, F., Hinz, G., Knoll, A.: Occlusion-aware planning for autonomous driving with vehicle-to-everything communication. IEEE Trans. Intell. Veh. 9(1), 1229–1242 (2024). https://doi.org/10.1109/TIV.2023.3308098
Article Google Scholar

Download references

Author information

Authors and Affiliations

Harvard University, Cambridge, MA, 02138, USA
Jiezhi Yang
Columbia University, New York, NY, 10025, USA
Khushi Desai
UC Berkeley, Berkeley, CA, 94720, USA
Charles Packer, Nicholas Rhinehart & Joseph E. Gonzalez
Avataar.ai, Bengaluru, 560103, Karnataka, India
Harshil Bhatia
Toyota Research Institute, Los Altos, CA, 94022, USA
Rowan McAllister

Authors

Jiezhi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Khushi Desai
View author publications
You can also search for this author in PubMed Google Scholar
Charles Packer
View author publications
You can also search for this author in PubMed Google Scholar
Harshil Bhatia
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas Rhinehart
View author publications
You can also search for this author in PubMed Google Scholar
Rowan McAllister
View author publications
You can also search for this author in PubMed Google Scholar
Joseph E. Gonzalez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiezhi Yang .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 10906 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, J. et al. (2025). CARFF: Conditional Auto-Encoded Radiance Field for 3D Scene Forecasting. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15147. Springer, Cham. https://doi.org/10.1007/978-3-031-73024-5_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-73024-5_14
Published: 24 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73023-8
Online ISBN: 978-3-031-73024-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CARFF: Conditional Auto-Encoded Radiance Field for 3D Scene Forecasting