Abstract
We propose CARFF: Conditional Auto-encoded Radiance Field for 3D Scene Forecasting, a method for predicting future 3D scenes given past observations. Our method maps 2D ego-centric images to a distribution over plausible 3D latent scene configurations and predicts the evolution of hypothesized scenes through time. Our latents condition a global Neural Radiance Field (NeRF) to represent a 3D scene model, enabling explainable predictions and straightforward downstream planning. This approach models the world as a POMDP and considers complex scenarios of uncertainty in environmental states and dynamics. Specifically, we employ a two-stage training of Pose-Conditional-VAE and NeRF to learn 3D representations, and auto-regressively predict latent scene representations utilizing a mixture density network. We demonstrate the utility of our method in scenarios using the CARLA driving simulator, where CARFF enables efficient trajectory and contingency planning in complex multi-agent autonomous driving scenarios involving occlusions. Video and code are available at: www.carff.website.
J. Yang and K. Desai—Core contributors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adamkiewicz, M., et al.: Vision-only robot navigation in a neural radiance world. IEEE Rob. Autom. Lett. 7(2), 4606–4613 (2022)
Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-nerf: a multiscale representation for anti-aliasing neural radiance fields. In: International Conference on Computer Vision, pp. 5855–5864 (2021)
Cao, A., Johnson, J.: Hexplane: Dfor dynamic scenes. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 130–141 (2023)
Cao, J., Wang, X., Darrell, T., Yu, F.: Instance-aware predictive navigation in multi-agent environments. In: IEEE International Conference on Robotics and Automation, pp. 5096–5102. IEEE (2021)
Charatan, D., Li, S., Tagliasacchi, A., Sitzmann, V.: pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction (2023)
Codevilla, F., Santana, E., López, A.M., Gaidon, A.: Exploring the limitations of behavior cloning for autonomous driving. In: International Conference on Computer Vision, pp. 9329–9338 (2019)
Deng, K., Liu, A., Zhu, J.Y., Ramanan, D.: Depth-supervised nerf: fewer views and faster training for free. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 12882–12891 (2022)
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. In: International Conference on Learning Representation (2021)
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: Conference on Robot Learning, pp. 1–16 (2017)
Driess, D., Huang, Z., Li, Y., Tedrake, R., Toussaint, M.: Learning multi-object dynamics with compositional neural radiance fields. arXiv preprint arXiv:2202.11855 (2022)
Fei, B., Xu, J., Zhang, R., Zhou, Q., Yang, W., He, Y.: 3d gaussian as a new vision era: a survey (2024)
Fridovich-Keil, S., Yu, A., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: radiance fields without neural networks. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 5501–5510 (2022)
Hausknecht, M., Stone, P.: Deep recurrent q-learning for partially observable mdps. In: AAAI (2015)
Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)
Ichnowski, J., Avigal, Y., Kerr, J., Goldberg, K.: Dex-nerf: using a neural radiance field to grasp transparent objects. arXiv preprint arXiv:2110.14217 (2021)
Ivanovic, B., Elhafsi, A., Rosman, G., Gaidon, A., Pavone, M.: Mats: an interpretable trajectory forecasting representation for planning and control. In: Conference on Robot Learning (2021)
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1), 99–134 (1998). https://doi.org/10.1016/S0004-3702(98)00023-X. https://www.sciencedirect.com/science/article/pii/S000437029800023X
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4) (2023). https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
Kerr, J., et al.: Evo-nerf: evolving nerf for sequential robot grasping of transparent objects. In: Conference on Robot Learning (2022)
Kosiorek, A.R., et al.: NeRF-VAE: a geometry aware 3d scene generative model. In: ICML, pp. 5742–5752 (2021)
Li, Y., Li, S., Sitzmann, V., Agrawal, P., Torralba, A.: 3d neural scene representations for visuomotor control. In: Conference on Robot Learning, pp. 112–123 (2022)
Liu, J.W., et al.: Devrf: fast deformable voxel radiance fields for dynamic scenes. Adv. Neural Inform. Process. Syst. 35, 36762–36775 (2022)
Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3d gaussians: tracking by persistent dynamic view synthesis (2023)
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008). http://www.jmlr.org/papers/v9/vandermaaten08a.html
Marza, P., Matignon, L., Simonin, O., Wolf, C.: Multi-object navigation with dynamically learned neural implicit representations. In: International Conference on Computer Vision, pp. 11004–11015 (2023)
McAllister, R., Rasmussen, C.E.: Data-efficient reinforcement learning in continuous state-action gaussian-pomdps. Adv. Neural Inform. Process. Syst. (2017)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. 41(4), 1–15 (2022)
Packer, C., et al.: Is anyone there? learning a planner contingent on perceptual uncertainty. In: Conference on Robot Learning (2022)
Packer, C., et al.: Is anyone there? learning a planner contingent on perceptual uncertainty. In: Liu, K., Kulic, D., Ichnowski, J. (eds.) Conference on Robot Learning, pp. 1607–1617 (2023)
Pan, X., You, Y., Wang, Z., Lu, C.: Virtual to real reinforcement learning for autonomous driving. arXiv preprint arXiv:1704.03952 (2017)
Papadimitriou, C.H., Tsitsiklis, J.N.: The complexity of markov decision processes. Math. Oper. Res. 12(3), 441–450 (1987)
Park, K., et al.: Nerfies: deformable neural radiance fields. In: International Conference on Computer Vision, pp. 5865–5874 (2021)
Park, K., et al.: Hypernerf: a higher-dimensional representation for topologically varying neural radiance fields. ACM Trans. Graph (2021)
Pineau, J., Gordon, G., Thrun, S., et al.: Point-based value iteration: an anytime algorithm for pomdps. In: IJCAI, vol. 3, pp. 1025–1032 (2003)
Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-NeRF: neural radiance fields for dynamic scenes. In: IEEE Conference on Computer Vision Pattern Recognition (2020)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)
Rhinehart, N., et al.: Contingencies from observations: tractable contingency planning with learned behavior models. In: IEEE International Conference on Robotics and Automation, pp. 13663–13669 (2021)
Roessle, B., Barron, J.T., Mildenhall, B., Srinivasan, P.P., Nießner, M.: Dense depth priors for neural radiance fields from sparse input views. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 12892–12901 (2022)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge (2015)
Shao, R., Zheng, Z., Tu, H., Liu, B., Zhang, H., Liu, Y.: Tensor4d: efficient neural 4d decomposition for high-fidelity dynamic reconstruction and rendering. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 16632–16642 (2023)
Sun, C., Sun, M., Chen, H.T.: Direct voxel grid optimization: super-fast convergence for radiance fields reconstruction. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 5459–5469 (2022)
Tancik, M., et al.: Block-nerf: scalable large scene neural view synthesis. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 8248–8258 (2022)
Tancik, M., et al.: Fourier features let networks learn high frequency functions in low dimensional domains. Adv. Neural Inform. Process. Syst., 7537–7547 (2020)
Tas, O.S., Stiller, C.: Limited visibility and uncertainty aware motion planning for automated driving. In: IEEE Intelligent Vehicles Symposium (IV) (2018)
Tewari, A., et al.: Advances in neural rendering. Comput. Graph. Forum (2022)
Toromanoff, M., Wirbel, E., Moutarde, F.: End-to-end model-free reinforcement learning for urban driving using implicit affordances. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 7153–7162 (2020)
Tretschk, E., Golyanik, V., Zollhoefer, M., Bozic, A., Lassner, C., Theobalt, C.: Scenerflow: time-consistent reconstruction of general dynamic scenes. In: International Conference on 3D Vision (3DV) (2023)
Truong, P., Rakotosaona, M.J., Manhardt, F., Tombari, F.: Sparf: neural radiance fields from sparse and noisy poses. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 4190–4200 (2023)
Turki, H., Ramanan, D., Satyanarayanan, M.: Mega-nerf: scalable construction of large-scale nerfs for virtual fly-throughs. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 12922–12931 (2022)
Wang, Q., et al.: Ibrnet: learning multi-view image-based rendering. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 4690–4699 (2021)
Xu, Q., et al.: Point-nerf: point-based neural radiance fields. In: IEEE International Conference Computer Vision and Pattern Recognition, pp. 5438–5448 (2022)
Yang, G.: VUER: a 3D visualization and data collection environment for robot learning (2024). https://github.com/vuer-ai/vuer
Yen-Chen, L., Florence, P., Barron, J.T., Rodriguez, A., Isola, P., Lin, T.Y.: inerf: inverting neural radiance fields for pose estimation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1323–1330. IEEE (2021)
Yu, A., Ye, V., Tancik, M., Kanazawa, A.: pixelnerf: neural radiance fields from one or few images (2021)
Yu, M.Y., Vasudevan, R., Johnson-Roberson, M.: Occlusion-aware risk assessment for autonomous driving in urban environments. IEEE Rob. Autom. Lett. 4(2), 2235–2241 (2019). https://doi.org/10.1109/lra.2019.2900453
Zhang, C., Steinhauser, F., Hinz, G., Knoll, A.: Improved occlusion scenario coverage with a pomdp-based behavior planner for autonomous urban driving. In: 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), pp. 593–600 (2021). https://doi.org/10.1109/ITSC48978.2021.9564424
Zhang, C., Steinhauser, F., Hinz, G., Knoll, A.: Occlusion-aware planning for autonomous driving with vehicle-to-everything communication. IEEE Trans. Intell. Veh. 9(1), 1229–1242 (2024). https://doi.org/10.1109/TIV.2023.3308098
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, J. et al. (2025). CARFF: Conditional Auto-Encoded Radiance Field for 3D Scene Forecasting. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15147. Springer, Cham. https://doi.org/10.1007/978-3-031-73024-5_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-73024-5_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73023-8
Online ISBN: 978-3-031-73024-5
eBook Packages: Computer ScienceComputer Science (R0)