Skip to main content

Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15093))

Included in the following conference series:

  • 407 Accesses

Abstract

Modeling the trajectories of intelligent vehicles is an essential component of a traffic-simulating system. However, such trajectory predictors are typically trained to imitate the movements of human drivers. The imitation models often fall short of capturing safety-critical events residing in the long-tail end of the data distribution, especially under complex environments involving multiple drivers. In this paper, we propose a game-theoretic perspective to resolve this challenge by modeling the competitive interactions of vehicles in a general-sum Markov game and characterizing these safety-critical events with the correlated equilibrium. To achieve this goal, we pretrain a generative world model to predict the environmental dynamics of self-driving scenarios. Based on this world model, we probe the action predictor for identifying the Coarse Correlated Equilibrium (CCE) by incorporating both optimistic Bellman update and magnetic mirror descent into the objective function of the Multi-Agent Reinforcement Learning (MARL) algorithm. We conduct extensive experiments to demonstrate our algorithm outperforms other baselines in terms of efficiently closing the CCE-gap and generating meaningful trajectories under competitive autonomous driving environments. The code is available at: https://github.com/qiaoguanren/MARL-CCE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    \(\varDelta ^{\mathcal {S}}\) denotes the probability simplex over the space \(\mathcal {S}\).

  2. 2.

    Throughout this work, the bold symbols (e.g., \(\boldsymbol{a}\)) indicate a vector of variables while the unbold ones (e.g., a) represent a single variable.

References

  1. Bai, Y., Jin, C., Wang, H., Xiong, C.: Sample-efficient learning of stackelberg equilibria in general-sum games. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 25799–25811 (2021)

    Google Scholar 

  2. Bai, Y., Jin, C., Yu, T.: Near-optimal reinforcement learning with self-play. In: Advances in Neural Information Processing Systems (NeurIPS) (2020)

    Google Scholar 

  3. Bergamini, L., et al.: Simnet: learning reactive self-driving simulations from real-world observations. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 5119–5125 (2021)

    Google Scholar 

  4. Brockfeld, E., Kühne, R.D., Skabardonis, A., Wagner, P.: Toward benchmarking of microscopic traffic flow models. Transp. Res. Rec. 1852, 124–129 (2003)

    Article  Google Scholar 

  5. Cen, S., Chi, Y., Du, S.S., Xiao, L.: Faster last-iterate convergence of policy optimization in zero-sum Markov games. In: International Conference on Learning Representations (ICLR) (2023)

    Google Scholar 

  6. Chai, Y., Sapp, B., Bansal, M., Anguelov, D.: Multipath: multiple probabilistic anchor trajectory hypotheses for behavior prediction. In: Annual Conference on Robot Learning (CoRL), vol. 100, pp. 86–99 (2019)

    Google Scholar 

  7. Chen, Y., Ivanovic, B., Pavone, M.: Scept: scene-consistent, policy-based trajectory predictions for planning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17082–17091 (2022)

    Google Scholar 

  8. Dosovitskiy, A., Ros, G., Codevilla, F., López, A.M., Koltun, V.: Carla: an open urban driving simulator. In: Annual Conference on Robot Learning (CoRL) (2017)

    Google Scholar 

  9. Feng, L., Li, Q., Peng, Z., Tan, S., Zhou, B.: Trafficgen: learning to generate diverse and realistic traffic scenarios. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 3567–3575 (2023)

    Google Scholar 

  10. Gulino, C., et al.: Waymax: an accelerated, data-driven simulator for large-scale autonomous driving research. CoRR abs/2310.08710 (2023)

    Google Scholar 

  11. Haarnoja, T., Tang, H., Abbeel, P., Levine, S.: Reinforcement learning with deep energy-based policies. In: International Conference on Machine Learning (ICML), pp. 1352–1361 (2017)

    Google Scholar 

  12. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning (ICML), vol. 80, pp. 1856–1865 (2018)

    Google Scholar 

  13. Hambly, B.M., Xu, R., Yang, H.: Policy gradient methods find the nash equilibrium in n-player general-sum linear-quadratic games. J. Mach. Learn. Res. (JMLR) 24, 139:1–139:56 (2023)

    Google Scholar 

  14. Hu, J., Wellman, M.P.: Nash Q-learning for general-sum stochastic games. J. Mach. Learn. Res. (JMLR) 4, 1039–1069 (2003)

    Google Scholar 

  15. Huang, Z., Liu, H., Lv, C.: Gameformer: game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving. In: International Conference on Computer Vision (ICCV), pp. 3880–3890 (2023)

    Google Scholar 

  16. Hwang, K.S., Chiou, J.Y., Chen, T.Y.: Cooperative reinforcement learning based on zero-sum games. In: SICE Annual Conference, pp. 2973–2976 (2008)

    Google Scholar 

  17. Jin, C., Liu, Q., Wang, Y., Yu, T.: V-learning-a simple, efficient, decentralized algorithm for multiagent RL. In: International Conference on Learning Representations (ICLR Workshop) (2022)

    Google Scholar 

  18. Kar, A., et al.: Meta-sim: learning to generate synthetic datasets. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4550–4559 (2019)

    Google Scholar 

  19. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: International Conference on Learning Representations (ICLR) (2014)

    Google Scholar 

  20. Leonardos, S., Overman, W., Panageas, I., Piliouras, G.: Global convergence of multi-agent policy gradient in Markov potential games. In: International Conference on Learning Representations (ICLR) (2022)

    Google Scholar 

  21. Liang, M., et al.: Learning lane graph representations for motion forecasting. In: European Conference on Computer Vision (ECCV), vol. 12347, pp. 541–556 (2020)

    Google Scholar 

  22. Lioutas, V., Scibior, A., Wood, F.: Titrated: learned human driving behavior without infractions via amortized inference. Trans. Mach. Learn. Res. (TMLR) (2022)

    Google Scholar 

  23. Liu, M., Ozdaglar, A.E., Yu, T., Zhang, K.: The power of regularization in solving extensive-form games. In: International Conference on Learning Representations (ICLR) (2023)

    Google Scholar 

  24. Liu, S., Zhu, M.: Distributed inverse constrained reinforcement learning for multi-agent systems. In: Neural Information Processing Systems (NeurIPS) (2022)

    Google Scholar 

  25. Liu, S., Zhu, M.: Learning multi-agent behaviors from distributed and streaming demonstrations. In: Neural Information Processing Systems (NeurIPS) (2023)

    Google Scholar 

  26. Lopez, P.A., et al.: Microscopic traffic simulation using sumo. In: IEEE International Conference on Intelligent Transportation Systems (ITSC), pp. 2575–2582 (2018)

    Google Scholar 

  27. Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 6379–6390 (2017)

    Google Scholar 

  28. Mao, W., Basar, T.: Provably efficient reinforcement learning in decentralized general-sum Markov games. Dyn. Games Appl. 13(1), 165–186 (2023)

    MathSciNet  Google Scholar 

  29. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  30. Nash, J.: Non-cooperative games. Ann. Math. 286–295 (1951)

    Google Scholar 

  31. Salzmann, T., Ivanovic, B., Chakravarty, P., Pavone, M.: Trajectron++: dynamically-feasible trajectory forecasting with heterogeneous data. In: European Conference on Computer Vision (ECCV), vol. 12363, pp. 683–700 (2020)

    Google Scholar 

  32. Schulman, J., Levine, S., Abbeel, P., Jordan, M.I., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning (ICML), vol. 37, pp. 1889–1897 (2015)

    Google Scholar 

  33. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017)

    Google Scholar 

  34. Ścibior, A., Lioutas, V., Reda, D., Bateni, P., Wood, F.: Imagining the road ahead: multi-agent trajectory prediction via differentiable simulation. In: IEEE International Intelligent Transportation Systems Conference (ITSC), pp. 720–725 (2021)

    Google Scholar 

  35. Sokota, S., et al.: A unified approach to reinforcement learning, quantal response equilibria, and two-player zero-sum games. In: International Conference on Learning Representations (ICLR) (2023)

    Google Scholar 

  36. Song, Z., Mei, S., Bai, Y.: When can we learn general-sum Markov games with a large number of players sample-efficiently? In: International Conference on Learning Representations (ICLR) (2022)

    Google Scholar 

  37. Suo, S., Regalado, S., Casas, S., Urtasun, R.: Trafficsim: learning to simulate realistic multi-agent behaviors. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10400–10409 (2021)

    Google Scholar 

  38. Suo, S., et al.: Mixsim: a hierarchical framework for mixed reality traffic simulation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9622–9631 (2023)

    Google Scholar 

  39. Tan, S., Wong, K., Wang, S., Manivasagam, S., Ren, M., Urtasun, R.: Scenegen: learning to generate realistic traffic scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 892–901 (2021)

    Google Scholar 

  40. Wilson, B., et al.: Argoverse 2: next generation datasets for self-driving perception and forecasting. In: Advances in Neural Information Processing Systems Track on Datasets and Benchmarks 1 (NeurIPS) (2021)

    Google Scholar 

  41. Xu, D., Chen, Y., Ivanovic, B., Pavone, M.: Bits: bi-level imitation for traffic simulation. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 2929–2936. IEEE (2023)

    Google Scholar 

  42. Yan, X., Zou, Z., Feng, S., Zhu, H., Sun, H., Liu, H.X.: Learning naturalistic driving environment with statistical realism. Nat. Commun. 14(1), 2037 (2023)

    Article  Google Scholar 

  43. Yang, Y., Wang, J.: An overview of multi-agent reinforcement learning from game theoretical perspective. CoRR abs/2011.00583 (2020)

    Google Scholar 

  44. Yu, C., et al.: The surprising effectiveness of PPO in cooperative multi-agent games. In: Advances in Neural Information Processing Systems (NeurIPS) (2022)

    Google Scholar 

  45. Yu, Z., Yang, J., Huang, H.H.: Smoothing regression and impact measures for accidents of traffic flows. J. Appl. Stat. 51, 1041–1056 (2023)

    Article  MathSciNet  Google Scholar 

  46. Zhang, C., Tu, J., Zhang, L., Wong, K., Suo, S., Urtasun, R.: Learning realistic traffic agents in closed-loop. In: Annual Conference on Robot Learning (CoRL) (2023)

    Google Scholar 

  47. Zhang, K., Koppel, A., Zhu, H., Basar, T.: Global convergence of policy gradient methods to (almost) locally optimal policies. SIAM J. Control. Optim. 58(6), 3586–3612 (2020)

    Article  MathSciNet  Google Scholar 

  48. Zhang, K., Yang, Z., Başar, T.: Multi-agent reinforcement learning: a selective overview of theories and algorithms. In: Handbook of Reinforcement Learning and Control, pp. 321–384 (2021)

    Google Scholar 

  49. Zhang, Y., Zhang, R., Gu, Y., Li, N.: Multi-agent reinforcement learning with reward delays. In: Learning for Dynamics and Control Conference (L4DC), vol. 211, pp. 692–704 (2023)

    Google Scholar 

  50. Zhou, Z., Wang, J., Li, Y., Huang, Y.: Query-centric trajectory prediction. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17863–17873 (2023)

    Google Scholar 

  51. Zhou, Z., Wen, Z., Wang, J., Li, Y., Huang, Y.: Qcnext: a next-generation framework for joint multi-agent trajectory prediction. CoRR abs/2306.10508 (2023)

    Google Scholar 

  52. Ziebart, B.D., Bagnell, J.A., Dey, A.K.: Modeling interaction via the principle of maximum causal entropy. In: International Conference on Machine Learning (ICML), pp. 1255–1262 (2010)

    Google Scholar 

Download references

Acknowledgements

This work is supported in part by Shenzhen Fundamental Research Program (General Program) under grant JCYJ20230807114202005, Guangdong-Shenzhen Joint Research Fund under grant 2023A1515110617, Guangdong Basic and Applied Basic Research Foundation under grant 2024A1515012103, Shenzhen Science and Technology Program ZDSYS20211021111415025, and Guangdong Provincial Key Laboratory of Mathematical Foundations for Artificial Intelligence (2023B1212010001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guiliang Liu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 2578 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Qiao, G., Quan, G., Qu, R., Liu, G. (2025). Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15093. Springer, Cham. https://doi.org/10.1007/978-3-031-72761-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72761-0_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72760-3

  • Online ISBN: 978-3-031-72761-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics