Skip to main content

CARFF: Conditional Auto-Encoded Radiance Field for 3D Scene Forecasting

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

We propose CARFF: Conditional Auto-encoded Radiance Field for 3D Scene Forecasting, a method for predicting future 3D scenes given past observations. Our method maps 2D ego-centric images to a distribution over plausible 3D latent scene configurations and predicts the evolution of hypothesized scenes through time. Our latents condition a global Neural Radiance Field (NeRF) to represent a 3D scene model, enabling explainable predictions and straightforward downstream planning. This approach models the world as a POMDP and considers complex scenarios of uncertainty in environmental states and dynamics. Specifically, we employ a two-stage training of Pose-Conditional-VAE and NeRF to learn 3D representations, and auto-regressively predict latent scene representations utilizing a mixture density network. We demonstrate the utility of our method in scenarios using the CARLA driving simulator, where CARFF enables efficient trajectory and contingency planning in complex multi-agent autonomous driving scenarios involving occlusions. Video and code are available at: www.carff.website.

J. Yang and K. Desai—Core contributors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Adamkiewicz, M., et al.: Vision-only robot navigation in a neural radiance world. IEEE Rob. Autom. Lett. 7(2), 4606–4613 (2022)

    Article  Google Scholar 

  2. Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-nerf: a multiscale representation for anti-aliasing neural radiance fields. In: International Conference on Computer Vision, pp. 5855–5864 (2021)

    Google Scholar 

  3. Cao, A., Johnson, J.: Hexplane: Dfor dynamic scenes. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 130–141 (2023)

    Google Scholar 

  4. Cao, J., Wang, X., Darrell, T., Yu, F.: Instance-aware predictive navigation in multi-agent environments. In: IEEE International Conference on Robotics and Automation, pp. 5096–5102. IEEE (2021)

    Google Scholar 

  5. Charatan, D., Li, S., Tagliasacchi, A., Sitzmann, V.: pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction (2023)

    Google Scholar 

  6. Codevilla, F., Santana, E., López, A.M., Gaidon, A.: Exploring the limitations of behavior cloning for autonomous driving. In: International Conference on Computer Vision, pp. 9329–9338 (2019)

    Google Scholar 

  7. Deng, K., Liu, A., Zhu, J.Y., Ramanan, D.: Depth-supervised nerf: fewer views and faster training for free. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 12882–12891 (2022)

    Google Scholar 

  8. Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. In: International Conference on Learning Representation (2021)

    Google Scholar 

  9. Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: Conference on Robot Learning, pp. 1–16 (2017)

    Google Scholar 

  10. Driess, D., Huang, Z., Li, Y., Tedrake, R., Toussaint, M.: Learning multi-object dynamics with compositional neural radiance fields. arXiv preprint arXiv:2202.11855 (2022)

  11. Fei, B., Xu, J., Zhang, R., Zhou, Q., Yang, W., He, Y.: 3d gaussian as a new vision era: a survey (2024)

    Google Scholar 

  12. Fridovich-Keil, S., Yu, A., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: radiance fields without neural networks. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 5501–5510 (2022)

    Google Scholar 

  13. Hausknecht, M., Stone, P.: Deep recurrent q-learning for partially observable mdps. In: AAAI (2015)

    Google Scholar 

  14. Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)

    Article  Google Scholar 

  15. Ichnowski, J., Avigal, Y., Kerr, J., Goldberg, K.: Dex-nerf: using a neural radiance field to grasp transparent objects. arXiv preprint arXiv:2110.14217 (2021)

  16. Ivanovic, B., Elhafsi, A., Rosman, G., Gaidon, A., Pavone, M.: Mats: an interpretable trajectory forecasting representation for planning and control. In: Conference on Robot Learning (2021)

    Google Scholar 

  17. Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1), 99–134 (1998). https://doi.org/10.1016/S0004-3702(98)00023-X. https://www.sciencedirect.com/science/article/pii/S000437029800023X

  18. Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4) (2023). https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/

  19. Kerr, J., et al.: Evo-nerf: evolving nerf for sequential robot grasping of transparent objects. In: Conference on Robot Learning (2022)

    Google Scholar 

  20. Kosiorek, A.R., et al.: NeRF-VAE: a geometry aware 3d scene generative model. In: ICML, pp. 5742–5752 (2021)

    Google Scholar 

  21. Li, Y., Li, S., Sitzmann, V., Agrawal, P., Torralba, A.: 3d neural scene representations for visuomotor control. In: Conference on Robot Learning, pp. 112–123 (2022)

    Google Scholar 

  22. Liu, J.W., et al.: Devrf: fast deformable voxel radiance fields for dynamic scenes. Adv. Neural Inform. Process. Syst. 35, 36762–36775 (2022)

    Google Scholar 

  23. Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3d gaussians: tracking by persistent dynamic view synthesis (2023)

    Google Scholar 

  24. van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008). http://www.jmlr.org/papers/v9/vandermaaten08a.html

  25. Marza, P., Matignon, L., Simonin, O., Wolf, C.: Multi-object navigation with dynamically learned neural implicit representations. In: International Conference on Computer Vision, pp. 11004–11015 (2023)

    Google Scholar 

  26. McAllister, R., Rasmussen, C.E.: Data-efficient reinforcement learning in continuous state-action gaussian-pomdps. Adv. Neural Inform. Process. Syst. (2017)

    Google Scholar 

  27. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24

    Chapter  Google Scholar 

  28. Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. 41(4), 1–15 (2022)

    Article  Google Scholar 

  29. Packer, C., et al.: Is anyone there? learning a planner contingent on perceptual uncertainty. In: Conference on Robot Learning (2022)

    Google Scholar 

  30. Packer, C., et al.: Is anyone there? learning a planner contingent on perceptual uncertainty. In: Liu, K., Kulic, D., Ichnowski, J. (eds.) Conference on Robot Learning, pp. 1607–1617 (2023)

    Google Scholar 

  31. Pan, X., You, Y., Wang, Z., Lu, C.: Virtual to real reinforcement learning for autonomous driving. arXiv preprint arXiv:1704.03952 (2017)

  32. Papadimitriou, C.H., Tsitsiklis, J.N.: The complexity of markov decision processes. Math. Oper. Res. 12(3), 441–450 (1987)

    Article  MathSciNet  Google Scholar 

  33. Park, K., et al.: Nerfies: deformable neural radiance fields. In: International Conference on Computer Vision, pp. 5865–5874 (2021)

    Google Scholar 

  34. Park, K., et al.: Hypernerf: a higher-dimensional representation for topologically varying neural radiance fields. ACM Trans. Graph (2021)

    Google Scholar 

  35. Pineau, J., Gordon, G., Thrun, S., et al.: Point-based value iteration: an anytime algorithm for pomdps. In: IJCAI, vol. 3, pp. 1025–1032 (2003)

    Google Scholar 

  36. Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-NeRF: neural radiance fields for dynamic scenes. In: IEEE Conference on Computer Vision Pattern Recognition (2020)

    Google Scholar 

  37. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  38. Rhinehart, N., et al.: Contingencies from observations: tractable contingency planning with learned behavior models. In: IEEE International Conference on Robotics and Automation, pp. 13663–13669 (2021)

    Google Scholar 

  39. Roessle, B., Barron, J.T., Mildenhall, B., Srinivasan, P.P., Nießner, M.: Dense depth priors for neural radiance fields from sparse input views. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 12892–12901 (2022)

    Google Scholar 

  40. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge (2015)

    Google Scholar 

  41. Shao, R., Zheng, Z., Tu, H., Liu, B., Zhang, H., Liu, Y.: Tensor4d: efficient neural 4d decomposition for high-fidelity dynamic reconstruction and rendering. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 16632–16642 (2023)

    Google Scholar 

  42. Sun, C., Sun, M., Chen, H.T.: Direct voxel grid optimization: super-fast convergence for radiance fields reconstruction. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 5459–5469 (2022)

    Google Scholar 

  43. Tancik, M., et al.: Block-nerf: scalable large scene neural view synthesis. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 8248–8258 (2022)

    Google Scholar 

  44. Tancik, M., et al.: Fourier features let networks learn high frequency functions in low dimensional domains. Adv. Neural Inform. Process. Syst., 7537–7547 (2020)

    Google Scholar 

  45. Tas, O.S., Stiller, C.: Limited visibility and uncertainty aware motion planning for automated driving. In: IEEE Intelligent Vehicles Symposium (IV) (2018)

    Google Scholar 

  46. Tewari, A., et al.: Advances in neural rendering. Comput. Graph. Forum (2022)

    Google Scholar 

  47. Toromanoff, M., Wirbel, E., Moutarde, F.: End-to-end model-free reinforcement learning for urban driving using implicit affordances. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 7153–7162 (2020)

    Google Scholar 

  48. Tretschk, E., Golyanik, V., Zollhoefer, M., Bozic, A., Lassner, C., Theobalt, C.: Scenerflow: time-consistent reconstruction of general dynamic scenes. In: International Conference on 3D Vision (3DV) (2023)

    Google Scholar 

  49. Truong, P., Rakotosaona, M.J., Manhardt, F., Tombari, F.: Sparf: neural radiance fields from sparse and noisy poses. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 4190–4200 (2023)

    Google Scholar 

  50. Turki, H., Ramanan, D., Satyanarayanan, M.: Mega-nerf: scalable construction of large-scale nerfs for virtual fly-throughs. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 12922–12931 (2022)

    Google Scholar 

  51. Wang, Q., et al.: Ibrnet: learning multi-view image-based rendering. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 4690–4699 (2021)

    Google Scholar 

  52. Xu, Q., et al.: Point-nerf: point-based neural radiance fields. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 5438–5448 (2022)

    Google Scholar 

  53. Yang, G.: VUER: a 3D visualization and data collection environment for robot learning (2024). https://github.com/vuer-ai/vuer

  54. Yen-Chen, L., Florence, P., Barron, J.T., Rodriguez, A., Isola, P., Lin, T.Y.: inerf: inverting neural radiance fields for pose estimation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1323–1330. IEEE (2021)

    Google Scholar 

  55. Yu, A., Ye, V., Tancik, M., Kanazawa, A.: pixelnerf: neural radiance fields from one or few images (2021)

    Google Scholar 

  56. Yu, M.Y., Vasudevan, R., Johnson-Roberson, M.: Occlusion-aware risk assessment for autonomous driving in urban environments. IEEE Rob. Autom. Lett. 4(2), 2235–2241 (2019). https://doi.org/10.1109/lra.2019.2900453

    Article  Google Scholar 

  57. Zhang, C., Steinhauser, F., Hinz, G., Knoll, A.: Improved occlusion scenario coverage with a pomdp-based behavior planner for autonomous urban driving. In: 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), pp. 593–600 (2021). https://doi.org/10.1109/ITSC48978.2021.9564424

  58. Zhang, C., Steinhauser, F., Hinz, G., Knoll, A.: Occlusion-aware planning for autonomous driving with vehicle-to-everything communication. IEEE Trans. Intell. Veh. 9(1), 1229–1242 (2024). https://doi.org/10.1109/TIV.2023.3308098

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiezhi Yang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 10906 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, J. et al. (2025). CARFF: Conditional Auto-Encoded Radiance Field for 3D Scene Forecasting. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15147. Springer, Cham. https://doi.org/10.1007/978-3-031-73024-5_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73024-5_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73023-8

  • Online ISBN: 978-3-031-73024-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics