Optimizing Diffusion Models for Joint Trajectory Prediction and Controllable Generation

Wang, Yixiao; Tang, Chen; Sun, Lingfeng; Rossi, Simone; Xie, Yichen; Peng, Chensheng; Hannagan, Thomas; Sabatini, Stefano; Poerio, Nicola; Tomizuka, Masayoshi; Zhan, Wei

doi:10.1007/978-3-031-73397-0_19

Yixiao Wang¹³,
Chen Tang^13,14,
Lingfeng Sun¹³,
Simone Rossi¹⁵,
Yichen Xie¹³,
Chensheng Peng¹³,
Thomas Hannagan¹⁵,
Stefano Sabatini¹⁵,
Nicola Poerio¹⁶,
Masayoshi Tomizuka¹³ &
…
Wei Zhan¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15087))

Included in the following conference series:

European Conference on Computer Vision

209 Accesses
1 Citations

Abstract

Diffusion models are promising for joint trajectory prediction and controllable generation in autonomous driving, but they face challenges of inefficient inference steps and high computational demands. To tackle these challenges, we introduce Optimal Gaussian Diffusion (OGD) and Estimated Clean Manifold (ECM) Guidance. OGD optimizes the prior distribution for a small diffusion time T and starts the reverse diffusion process from it. ECM directly injects guidance gradients to the estimated clean manifold, eliminating extensive gradient backpropagation throughout the network. Our methodology streamlines the generative process, enabling practical applications with reduced computational overhead. Experimental validation on the large-scale Argoverse 2 dataset demonstrates our approach’s superior performance, offering a viable solution for computationally efficient, high-quality joint trajectory prediction and controllable generation for autonomous driving. Our project webpage is at https://yixiaowang7.github.io/OptTrajDiff_Page/

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

GenAD: Generative End-to-End Autonomous Driving

TridentNet: A Conditional Generative Model for Dynamic Trajectory Generation

An Enhanced Driving Trajectory Prediction Method Based on Generative Adversarial Imitation Learning

References

Avrahami, O., Lischinski, D., Fried, O.: Blended diffusion for text-driven editing of natural images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18208–18218 (2022)
Google Scholar
Cheng, J., Mei, X., Liu, M.: Forecast-MAE: self-supervised pre-training for motion forecasting with masked autoencoders. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8679–8689 (2023)
Google Scholar
Chung, H., Sim, B., Ye, J.C.: Come-closer-diffuse-faster: accelerating conditional diffusion models for inverse problems through stochastic contraction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12413–12422 (2022)
Google Scholar
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. Adv. Neural. Inf. Process. Syst. 34, 8780–8794 (2021)
Google Scholar
Franzese, G., et al.: How much is enough? A study on diffusion times in score-based generative models. Entropy 25(4), 633 (2023)
Article MathSciNet Google Scholar
Gao, X., Jia, X., Li, Y., Xiong, H.: Dynamic scenario representation learning for motion forecasting with heterogeneous graph convolutional recurrent networks. IEEE Robot. Autom. Lett. 8(5), 2946–2953 (2023)
Article Google Scholar
Gilles, T., Sabatini, S., Tsishkou, D., Stanciulescu, B., Moutarde, F.: THOMAS: trajectory heatmap output with learned multi-agent sampling. arXiv preprint arXiv:2110.06607 (2021)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Gu, J., Sun, C., Zhao, H.: DenseTNT: end-to-end trajectory prediction from dense goal sets. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15303–15312 (2021)
Google Scholar
Gu, T., et al.: Stochastic trajectory prediction via motion indeterminacy diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17113–17122 (2022)
Google Scholar
Guo, Z., Gao, X., Zhou, J., Cai, X., Shi, B.: SceneDM: scene-level multi-agent trajectory generation with consistent diffusion models. arXiv preprint arXiv:2311.15736 (2023)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural. Inf. Process. Syst. 33, 6840–6851 (2020)
Google Scholar
Ho, J., Salimans, T., Gritsenko, A., Chan, W., Norouzi, M., Fleet, D.J.: Video diffusion models (2022)
Google Scholar
Hyvärinen, A., Dayan, P.: Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res. 6(4) (2005)
Google Scholar
Janner, M., Du, Y., Tenenbaum, J.B., Levine, S.: Planning with diffusion for flexible behavior synthesis. arXiv preprint arXiv:2205.09991 (2022)
Jiang, C., Cornman, A., Park, C., Sapp, B., Zhou, Y., Anguelov, D., et al.: MotionDiffuser: controllable multi-agent motion prediction using diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9644–9653 (2023)
Google Scholar
Kingma, D., Salimans, T., Poole, B., Ho, J.: Variational diffusion models. Adv. Neural. Inf. Process. Syst. 34, 21696–21707 (2021)
Google Scholar
Lan, Z., et al.: SEPT: towards efficient scene representation learning for motion prediction. arXiv preprint arXiv:2309.15289 (2023)
Laumanns, M., Thiele, L., Zitzler, E.: An efficient, adaptive parameter variation scheme for metaheuristics based on the epsilon-constraint method. Eur. J. Oper. Res. 169(3), 932–942 (2006)
Article MathSciNet Google Scholar
Lin, H., Wang, Y., Huo, M., Peng, C., Liu, Z., Tomizuka, M.: Joint pedestrian trajectory prediction through posterior sampling. arXiv preprint arXiv:2404.00237 (2024)
Lu, C., Zhou, Y., Bao, F., Chen, J., Li, C., Zhu, J.: DPM-Solver: a fast ode solver for diffusion probabilistic model sampling in around 10 steps. Adv. Neural. Inf. Process. Syst. 35, 5775–5787 (2022)
Google Scholar
Luo, S., Hu, W.: Diffusion probabilistic models for 3D point cloud generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2837–2845 (2021)
Google Scholar
Meng, C., et al.: SDEdit: guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073 (2022)
Nayakanti, N., Al-Rfou, R., Zhou, A., Goel, K., Refaat, K.S., Sapp, B.: Wayformer: motion forecasting via simple and efficient attention networks. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 2980–2987. IEEE (2023)
Google Scholar
Ngiam, J., et al.: Scene transformer: a unified architecture for predicting multiple agent trajectories. arXiv preprint arXiv:2106.08417 (2021)
Peng, C., et al.: Delflow: dense efficient learning of scene flow for large-scale point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16901–16910 (2023)
Google Scholar
Peng, C., et al.: Q-SLAM: quadric representations for monocular slam. arXiv preprint arXiv:2403.08125 (2024)
Rempe, D., et al.: Trace and pace: controllable pedestrian animation via guided trajectory diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13756–13766 (2023)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)
Google Scholar
Rowe, L., Ethier, M., Dykhne, E.H., Czarnecki, K.: FJMP: factorized joint multi-agent motion prediction over learned directed acyclic interaction graphs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13745–13755 (2023)
Google Scholar
Salimans, T., Ho, J.: Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512 (2022)
San-Roman, R., Nachmani, E., Wolf, L.: Noise estimation for generative diffusion models. arXiv preprint arXiv:2104.02600 (2021)
Sherali, H.D., Soyster, A.L.: Preemptive and nonpreemptive multi-objective programming: relationship and counterexamples. J. Optim. Theory Appl. 39, 173–186 (1983)
Article MathSciNet Google Scholar
Shi, S., Jiang, L., Dai, D., Schiele, B.: MTR++: multi-agent motion prediction with symmetric scene modeling and guided intention querying. arXiv preprint arXiv:2306.17770 (2023)
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)
Song, J., Vahdat, A., Mardani, M., Kautz, J.: Pseudoinverse-guided diffusion models for inverse problems. In: International Conference on Learning Representations (2022)
Google Scholar
Song, Y., Dhariwal, P., Chen, M., Sutskever, I.: Consistency models. arXiv preprint arXiv:2303.01469 (2023)
Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. Adv. Neural Inf. Process. Syst. 32 (2019)
Google Scholar
Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456 (2020)
Sun, Q., Huang, X., Gu, J., Williams, B.C., Zhao, H.: M2I: from factored marginal trajectory prediction to interactive prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6543–6552 (2022)
Google Scholar
Suo, S., Regalado, S., Casas, S., Urtasun, R.: TrafficSim: learning to simulate realistic multi-agent behaviors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10400–10409 (2021)
Google Scholar
Varadarajan, B., et al.: MultiPath++: efficient information fusion and trajectory aggregation for behavior prediction. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 7814–7821. IEEE (2022)
Google Scholar
Watson, D., Chan, W., Ho, J., Norouzi, M.: Learning fast samplers for diffusion models by differentiating through sample quality. arXiv preprint arXiv:2202.05830 (2022)
Wilson, B., et al.: Argoverse 2: next generation datasets for self-driving perception and forecasting. arXiv preprint arXiv:2301.00493 (2023)
Xu, D., Chen, Y., Ivanovic, B., Pavone, M.: BITS: bi-level imitation for traffic simulation. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 2929–2936. IEEE (2023)
Google Scholar
Zhang, Q., Chen, Y.: Fast sampling of diffusion models with exponential integrator. arXiv preprint arXiv:2204.13902 (2022)
Zhao, et al.: TNT: target-driven trajectory prediction. In: Conference on Robot Learning, pp. 895–904. PMLR (2021)
Google Scholar
Zheng, H., He, P., Chen, W., Zhou, M.: Truncated diffusion probabilistic models. arXiv preprint arXiv:2202.09671 (2022)
Zhong, Z., et al.: Guided conditional diffusion for controllable traffic simulation. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 3560–3566. IEEE (2023)
Google Scholar
Zhou, Z., Wang, J., Li, Y.H., Huang, Y.K.: Query-centric trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17863–17873 (2023)
Google Scholar
Zhou, Z., Wen, Z., Wang, J., Li, Y.H., Huang, Y.K.: QCNeXt: a next-generation framework for joint multi-agent trajectory prediction (2023)
Google Scholar

Download references

Acknowledgement

This work was supported by Berkeley DeepDrive. $^{5}$https://deepdrive.berkeley.edu

Author information

Authors and Affiliations

University of California, Berkeley, USA
Yixiao Wang, Chen Tang, Lingfeng Sun, Yichen Xie, Chensheng Peng, Masayoshi Tomizuka & Wei Zhan
The University of Texas at Austin, Austin, USA
Chen Tang
Stellantis, Paris, France
Simone Rossi, Thomas Hannagan & Stefano Sabatini
Stellantis, Turin, Italy
Nicola Poerio

Authors

Yixiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chen Tang
View author publications
You can also search for this author in PubMed Google Scholar
Lingfeng Sun
View author publications
You can also search for this author in PubMed Google Scholar
Simone Rossi
View author publications
You can also search for this author in PubMed Google Scholar
Yichen Xie
View author publications
You can also search for this author in PubMed Google Scholar
Chensheng Peng
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Hannagan
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Sabatini
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Poerio
View author publications
You can also search for this author in PubMed Google Scholar
Masayoshi Tomizuka
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yixiao Wang .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 12992 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y. et al. (2025). Optimizing Diffusion Models for Joint Trajectory Prediction and Controllable Generation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15087. Springer, Cham. https://doi.org/10.1007/978-3-031-73397-0_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-73397-0_19
Published: 03 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73396-3
Online ISBN: 978-3-031-73397-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Optimizing Diffusion Models for Joint Trajectory Prediction and Controllable Generation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GenAD: Generative End-to-End Autonomous Driving

TridentNet: A Conditional Generative Model for Dynamic Trajectory Generation

An Enhanced Driving Trajectory Prediction Method Based on Generative Adversarial Imitation Learning

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 12992 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Optimizing Diffusion Models for Joint Trajectory Prediction and Controllable Generation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GenAD: Generative End-to-End Autonomous Driving

TridentNet: A Conditional Generative Model for Dynamic Trajectory Generation

An Enhanced Driving Trajectory Prediction Method Based on Generative Adversarial Imitation Learning

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 12992 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation