Abstract
Diffusion models are promising for joint trajectory prediction and controllable generation in autonomous driving, but they face challenges of inefficient inference steps and high computational demands. To tackle these challenges, we introduce Optimal Gaussian Diffusion (OGD) and Estimated Clean Manifold (ECM) Guidance. OGD optimizes the prior distribution for a small diffusion time T and starts the reverse diffusion process from it. ECM directly injects guidance gradients to the estimated clean manifold, eliminating extensive gradient backpropagation throughout the network. Our methodology streamlines the generative process, enabling practical applications with reduced computational overhead. Experimental validation on the large-scale Argoverse 2 dataset demonstrates our approach’s superior performance, offering a viable solution for computationally efficient, high-quality joint trajectory prediction and controllable generation for autonomous driving. Our project webpage is at https://yixiaowang7.github.io/OptTrajDiff_Page/
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Avrahami, O., Lischinski, D., Fried, O.: Blended diffusion for text-driven editing of natural images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18208–18218 (2022)
Cheng, J., Mei, X., Liu, M.: Forecast-MAE: self-supervised pre-training for motion forecasting with masked autoencoders. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8679–8689 (2023)
Chung, H., Sim, B., Ye, J.C.: Come-closer-diffuse-faster: accelerating conditional diffusion models for inverse problems through stochastic contraction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12413–12422 (2022)
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. Adv. Neural. Inf. Process. Syst. 34, 8780–8794 (2021)
Franzese, G., et al.: How much is enough? A study on diffusion times in score-based generative models. Entropy 25(4), 633 (2023)
Gao, X., Jia, X., Li, Y., Xiong, H.: Dynamic scenario representation learning for motion forecasting with heterogeneous graph convolutional recurrent networks. IEEE Robot. Autom. Lett. 8(5), 2946–2953 (2023)
Gilles, T., Sabatini, S., Tsishkou, D., Stanciulescu, B., Moutarde, F.: THOMAS: trajectory heatmap output with learned multi-agent sampling. arXiv preprint arXiv:2110.06607 (2021)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Gu, J., Sun, C., Zhao, H.: DenseTNT: end-to-end trajectory prediction from dense goal sets. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15303–15312 (2021)
Gu, T., et al.: Stochastic trajectory prediction via motion indeterminacy diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17113–17122 (2022)
Guo, Z., Gao, X., Zhou, J., Cai, X., Shi, B.: SceneDM: scene-level multi-agent trajectory generation with consistent diffusion models. arXiv preprint arXiv:2311.15736 (2023)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural. Inf. Process. Syst. 33, 6840–6851 (2020)
Ho, J., Salimans, T., Gritsenko, A., Chan, W., Norouzi, M., Fleet, D.J.: Video diffusion models (2022)
Hyvärinen, A., Dayan, P.: Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res. 6(4) (2005)
Janner, M., Du, Y., Tenenbaum, J.B., Levine, S.: Planning with diffusion for flexible behavior synthesis. arXiv preprint arXiv:2205.09991 (2022)
Jiang, C., Cornman, A., Park, C., Sapp, B., Zhou, Y., Anguelov, D., et al.: MotionDiffuser: controllable multi-agent motion prediction using diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9644–9653 (2023)
Kingma, D., Salimans, T., Poole, B., Ho, J.: Variational diffusion models. Adv. Neural. Inf. Process. Syst. 34, 21696–21707 (2021)
Lan, Z., et al.: SEPT: towards efficient scene representation learning for motion prediction. arXiv preprint arXiv:2309.15289 (2023)
Laumanns, M., Thiele, L., Zitzler, E.: An efficient, adaptive parameter variation scheme for metaheuristics based on the epsilon-constraint method. Eur. J. Oper. Res. 169(3), 932–942 (2006)
Lin, H., Wang, Y., Huo, M., Peng, C., Liu, Z., Tomizuka, M.: Joint pedestrian trajectory prediction through posterior sampling. arXiv preprint arXiv:2404.00237 (2024)
Lu, C., Zhou, Y., Bao, F., Chen, J., Li, C., Zhu, J.: DPM-Solver: a fast ode solver for diffusion probabilistic model sampling in around 10 steps. Adv. Neural. Inf. Process. Syst. 35, 5775–5787 (2022)
Luo, S., Hu, W.: Diffusion probabilistic models for 3D point cloud generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2837–2845 (2021)
Meng, C., et al.: SDEdit: guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073 (2022)
Nayakanti, N., Al-Rfou, R., Zhou, A., Goel, K., Refaat, K.S., Sapp, B.: Wayformer: motion forecasting via simple and efficient attention networks. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 2980–2987. IEEE (2023)
Ngiam, J., et al.: Scene transformer: a unified architecture for predicting multiple agent trajectories. arXiv preprint arXiv:2106.08417 (2021)
Peng, C., et al.: Delflow: dense efficient learning of scene flow for large-scale point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16901–16910 (2023)
Peng, C., et al.: Q-SLAM: quadric representations for monocular slam. arXiv preprint arXiv:2403.08125 (2024)
Rempe, D., et al.: Trace and pace: controllable pedestrian animation via guided trajectory diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13756–13766 (2023)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)
Rowe, L., Ethier, M., Dykhne, E.H., Czarnecki, K.: FJMP: factorized joint multi-agent motion prediction over learned directed acyclic interaction graphs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13745–13755 (2023)
Salimans, T., Ho, J.: Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512 (2022)
San-Roman, R., Nachmani, E., Wolf, L.: Noise estimation for generative diffusion models. arXiv preprint arXiv:2104.02600 (2021)
Sherali, H.D., Soyster, A.L.: Preemptive and nonpreemptive multi-objective programming: relationship and counterexamples. J. Optim. Theory Appl. 39, 173–186 (1983)
Shi, S., Jiang, L., Dai, D., Schiele, B.: MTR++: multi-agent motion prediction with symmetric scene modeling and guided intention querying. arXiv preprint arXiv:2306.17770 (2023)
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)
Song, J., Vahdat, A., Mardani, M., Kautz, J.: Pseudoinverse-guided diffusion models for inverse problems. In: International Conference on Learning Representations (2022)
Song, Y., Dhariwal, P., Chen, M., Sutskever, I.: Consistency models. arXiv preprint arXiv:2303.01469 (2023)
Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. Adv. Neural Inf. Process. Syst. 32 (2019)
Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456 (2020)
Sun, Q., Huang, X., Gu, J., Williams, B.C., Zhao, H.: M2I: from factored marginal trajectory prediction to interactive prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6543–6552 (2022)
Suo, S., Regalado, S., Casas, S., Urtasun, R.: TrafficSim: learning to simulate realistic multi-agent behaviors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10400–10409 (2021)
Varadarajan, B., et al.: MultiPath++: efficient information fusion and trajectory aggregation for behavior prediction. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 7814–7821. IEEE (2022)
Watson, D., Chan, W., Ho, J., Norouzi, M.: Learning fast samplers for diffusion models by differentiating through sample quality. arXiv preprint arXiv:2202.05830 (2022)
Wilson, B., et al.: Argoverse 2: next generation datasets for self-driving perception and forecasting. arXiv preprint arXiv:2301.00493 (2023)
Xu, D., Chen, Y., Ivanovic, B., Pavone, M.: BITS: bi-level imitation for traffic simulation. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 2929–2936. IEEE (2023)
Zhang, Q., Chen, Y.: Fast sampling of diffusion models with exponential integrator. arXiv preprint arXiv:2204.13902 (2022)
Zhao, et al.: TNT: target-driven trajectory prediction. In: Conference on Robot Learning, pp. 895–904. PMLR (2021)
Zheng, H., He, P., Chen, W., Zhou, M.: Truncated diffusion probabilistic models. arXiv preprint arXiv:2202.09671 (2022)
Zhong, Z., et al.: Guided conditional diffusion for controllable traffic simulation. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 3560–3566. IEEE (2023)
Zhou, Z., Wang, J., Li, Y.H., Huang, Y.K.: Query-centric trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17863–17873 (2023)
Zhou, Z., Wen, Z., Wang, J., Li, Y.H., Huang, Y.K.: QCNeXt: a next-generation framework for joint multi-agent trajectory prediction (2023)
Acknowledgement
This work was supported by Berkeley DeepDrive. \(^{5}\)https://deepdrive.berkeley.edu
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Y. et al. (2025). Optimizing Diffusion Models for Joint Trajectory Prediction and Controllable Generation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15087. Springer, Cham. https://doi.org/10.1007/978-3-031-73397-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-73397-0_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73396-3
Online ISBN: 978-3-031-73397-0
eBook Packages: Computer ScienceComputer Science (R0)