Abstract
Modeling the trajectories of intelligent vehicles is an essential component of a traffic-simulating system. However, such trajectory predictors are typically trained to imitate the movements of human drivers. The imitation models often fall short of capturing safety-critical events residing in the long-tail end of the data distribution, especially under complex environments involving multiple drivers. In this paper, we propose a game-theoretic perspective to resolve this challenge by modeling the competitive interactions of vehicles in a general-sum Markov game and characterizing these safety-critical events with the correlated equilibrium. To achieve this goal, we pretrain a generative world model to predict the environmental dynamics of self-driving scenarios. Based on this world model, we probe the action predictor for identifying the Coarse Correlated Equilibrium (CCE) by incorporating both optimistic Bellman update and magnetic mirror descent into the objective function of the Multi-Agent Reinforcement Learning (MARL) algorithm. We conduct extensive experiments to demonstrate our algorithm outperforms other baselines in terms of efficiently closing the CCE-gap and generating meaningful trajectories under competitive autonomous driving environments. The code is available at: https://github.com/qiaoguanren/MARL-CCE.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
\(\varDelta ^{\mathcal {S}}\) denotes the probability simplex over the space \(\mathcal {S}\).
- 2.
Throughout this work, the bold symbols (e.g., \(\boldsymbol{a}\)) indicate a vector of variables while the unbold ones (e.g., a) represent a single variable.
References
Bai, Y., Jin, C., Wang, H., Xiong, C.: Sample-efficient learning of stackelberg equilibria in general-sum games. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 25799–25811 (2021)
Bai, Y., Jin, C., Yu, T.: Near-optimal reinforcement learning with self-play. In: Advances in Neural Information Processing Systems (NeurIPS) (2020)
Bergamini, L., et al.: Simnet: learning reactive self-driving simulations from real-world observations. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 5119–5125 (2021)
Brockfeld, E., Kühne, R.D., Skabardonis, A., Wagner, P.: Toward benchmarking of microscopic traffic flow models. Transp. Res. Rec. 1852, 124–129 (2003)
Cen, S., Chi, Y., Du, S.S., Xiao, L.: Faster last-iterate convergence of policy optimization in zero-sum Markov games. In: International Conference on Learning Representations (ICLR) (2023)
Chai, Y., Sapp, B., Bansal, M., Anguelov, D.: Multipath: multiple probabilistic anchor trajectory hypotheses for behavior prediction. In: Annual Conference on Robot Learning (CoRL), vol. 100, pp. 86–99 (2019)
Chen, Y., Ivanovic, B., Pavone, M.: Scept: scene-consistent, policy-based trajectory predictions for planning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17082–17091 (2022)
Dosovitskiy, A., Ros, G., Codevilla, F., López, A.M., Koltun, V.: Carla: an open urban driving simulator. In: Annual Conference on Robot Learning (CoRL) (2017)
Feng, L., Li, Q., Peng, Z., Tan, S., Zhou, B.: Trafficgen: learning to generate diverse and realistic traffic scenarios. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 3567–3575 (2023)
Gulino, C., et al.: Waymax: an accelerated, data-driven simulator for large-scale autonomous driving research. CoRR abs/2310.08710 (2023)
Haarnoja, T., Tang, H., Abbeel, P., Levine, S.: Reinforcement learning with deep energy-based policies. In: International Conference on Machine Learning (ICML), pp. 1352–1361 (2017)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning (ICML), vol. 80, pp. 1856–1865 (2018)
Hambly, B.M., Xu, R., Yang, H.: Policy gradient methods find the nash equilibrium in n-player general-sum linear-quadratic games. J. Mach. Learn. Res. (JMLR) 24, 139:1–139:56 (2023)
Hu, J., Wellman, M.P.: Nash Q-learning for general-sum stochastic games. J. Mach. Learn. Res. (JMLR) 4, 1039–1069 (2003)
Huang, Z., Liu, H., Lv, C.: Gameformer: game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving. In: International Conference on Computer Vision (ICCV), pp. 3880–3890 (2023)
Hwang, K.S., Chiou, J.Y., Chen, T.Y.: Cooperative reinforcement learning based on zero-sum games. In: SICE Annual Conference, pp. 2973–2976 (2008)
Jin, C., Liu, Q., Wang, Y., Yu, T.: V-learning-a simple, efficient, decentralized algorithm for multiagent RL. In: International Conference on Learning Representations (ICLR Workshop) (2022)
Kar, A., et al.: Meta-sim: learning to generate synthetic datasets. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4550–4559 (2019)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: International Conference on Learning Representations (ICLR) (2014)
Leonardos, S., Overman, W., Panageas, I., Piliouras, G.: Global convergence of multi-agent policy gradient in Markov potential games. In: International Conference on Learning Representations (ICLR) (2022)
Liang, M., et al.: Learning lane graph representations for motion forecasting. In: European Conference on Computer Vision (ECCV), vol. 12347, pp. 541–556 (2020)
Lioutas, V., Scibior, A., Wood, F.: Titrated: learned human driving behavior without infractions via amortized inference. Trans. Mach. Learn. Res. (TMLR) (2022)
Liu, M., Ozdaglar, A.E., Yu, T., Zhang, K.: The power of regularization in solving extensive-form games. In: International Conference on Learning Representations (ICLR) (2023)
Liu, S., Zhu, M.: Distributed inverse constrained reinforcement learning for multi-agent systems. In: Neural Information Processing Systems (NeurIPS) (2022)
Liu, S., Zhu, M.: Learning multi-agent behaviors from distributed and streaming demonstrations. In: Neural Information Processing Systems (NeurIPS) (2023)
Lopez, P.A., et al.: Microscopic traffic simulation using sumo. In: IEEE International Conference on Intelligent Transportation Systems (ITSC), pp. 2575–2582 (2018)
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 6379–6390 (2017)
Mao, W., Basar, T.: Provably efficient reinforcement learning in decentralized general-sum Markov games. Dyn. Games Appl. 13(1), 165–186 (2023)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Nash, J.: Non-cooperative games. Ann. Math. 286–295 (1951)
Salzmann, T., Ivanovic, B., Chakravarty, P., Pavone, M.: Trajectron++: dynamically-feasible trajectory forecasting with heterogeneous data. In: European Conference on Computer Vision (ECCV), vol. 12363, pp. 683–700 (2020)
Schulman, J., Levine, S., Abbeel, P., Jordan, M.I., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning (ICML), vol. 37, pp. 1889–1897 (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017)
Ścibior, A., Lioutas, V., Reda, D., Bateni, P., Wood, F.: Imagining the road ahead: multi-agent trajectory prediction via differentiable simulation. In: IEEE International Intelligent Transportation Systems Conference (ITSC), pp. 720–725 (2021)
Sokota, S., et al.: A unified approach to reinforcement learning, quantal response equilibria, and two-player zero-sum games. In: International Conference on Learning Representations (ICLR) (2023)
Song, Z., Mei, S., Bai, Y.: When can we learn general-sum Markov games with a large number of players sample-efficiently? In: International Conference on Learning Representations (ICLR) (2022)
Suo, S., Regalado, S., Casas, S., Urtasun, R.: Trafficsim: learning to simulate realistic multi-agent behaviors. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10400–10409 (2021)
Suo, S., et al.: Mixsim: a hierarchical framework for mixed reality traffic simulation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9622–9631 (2023)
Tan, S., Wong, K., Wang, S., Manivasagam, S., Ren, M., Urtasun, R.: Scenegen: learning to generate realistic traffic scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 892–901 (2021)
Wilson, B., et al.: Argoverse 2: next generation datasets for self-driving perception and forecasting. In: Advances in Neural Information Processing Systems Track on Datasets and Benchmarks 1 (NeurIPS) (2021)
Xu, D., Chen, Y., Ivanovic, B., Pavone, M.: Bits: bi-level imitation for traffic simulation. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 2929–2936. IEEE (2023)
Yan, X., Zou, Z., Feng, S., Zhu, H., Sun, H., Liu, H.X.: Learning naturalistic driving environment with statistical realism. Nat. Commun. 14(1), 2037 (2023)
Yang, Y., Wang, J.: An overview of multi-agent reinforcement learning from game theoretical perspective. CoRR abs/2011.00583 (2020)
Yu, C., et al.: The surprising effectiveness of PPO in cooperative multi-agent games. In: Advances in Neural Information Processing Systems (NeurIPS) (2022)
Yu, Z., Yang, J., Huang, H.H.: Smoothing regression and impact measures for accidents of traffic flows. J. Appl. Stat. 51, 1041–1056 (2023)
Zhang, C., Tu, J., Zhang, L., Wong, K., Suo, S., Urtasun, R.: Learning realistic traffic agents in closed-loop. In: Annual Conference on Robot Learning (CoRL) (2023)
Zhang, K., Koppel, A., Zhu, H., Basar, T.: Global convergence of policy gradient methods to (almost) locally optimal policies. SIAM J. Control. Optim. 58(6), 3586–3612 (2020)
Zhang, K., Yang, Z., Başar, T.: Multi-agent reinforcement learning: a selective overview of theories and algorithms. In: Handbook of Reinforcement Learning and Control, pp. 321–384 (2021)
Zhang, Y., Zhang, R., Gu, Y., Li, N.: Multi-agent reinforcement learning with reward delays. In: Learning for Dynamics and Control Conference (L4DC), vol. 211, pp. 692–704 (2023)
Zhou, Z., Wang, J., Li, Y., Huang, Y.: Query-centric trajectory prediction. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17863–17873 (2023)
Zhou, Z., Wen, Z., Wang, J., Li, Y., Huang, Y.: Qcnext: a next-generation framework for joint multi-agent trajectory prediction. CoRR abs/2306.10508 (2023)
Ziebart, B.D., Bagnell, J.A., Dey, A.K.: Modeling interaction via the principle of maximum causal entropy. In: International Conference on Machine Learning (ICML), pp. 1255–1262 (2010)
Acknowledgements
This work is supported in part by Shenzhen Fundamental Research Program (General Program) under grant JCYJ20230807114202005, Guangdong-Shenzhen Joint Research Fund under grant 2023A1515110617, Guangdong Basic and Applied Basic Research Foundation under grant 2024A1515012103, Shenzhen Science and Technology Program ZDSYS20211021111415025, and Guangdong Provincial Key Laboratory of Mathematical Foundations for Artificial Intelligence (2023B1212010001).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Qiao, G., Quan, G., Qu, R., Liu, G. (2025). Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15093. Springer, Cham. https://doi.org/10.1007/978-3-031-72761-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-72761-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72760-3
Online ISBN: 978-3-031-72761-0
eBook Packages: Computer ScienceComputer Science (R0)