Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model

Qiao, Guanren; Quan, Guorui; Qu, Rongxiao; Liu, Guiliang

doi:10.1007/978-3-031-72761-0_2

Guanren Qiao¹³,
Guorui Quan¹⁴,
Rongxiao Qu¹³ &
…
Guiliang Liu¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15093))

Included in the following conference series:

European Conference on Computer Vision

407 Accesses

Abstract

Modeling the trajectories of intelligent vehicles is an essential component of a traffic-simulating system. However, such trajectory predictors are typically trained to imitate the movements of human drivers. The imitation models often fall short of capturing safety-critical events residing in the long-tail end of the data distribution, especially under complex environments involving multiple drivers. In this paper, we propose a game-theoretic perspective to resolve this challenge by modeling the competitive interactions of vehicles in a general-sum Markov game and characterizing these safety-critical events with the correlated equilibrium. To achieve this goal, we pretrain a generative world model to predict the environmental dynamics of self-driving scenarios. Based on this world model, we probe the action predictor for identifying the Coarse Correlated Equilibrium (CCE) by incorporating both optimistic Bellman update and magnetic mirror descent into the objective function of the Multi-Agent Reinforcement Learning (MARL) algorithm. We conduct extensive experiments to demonstrate our algorithm outperforms other baselines in terms of efficiently closing the CCE-gap and generating meaningful trajectories under competitive autonomous driving environments. The code is available at: https://github.com/qiaoguanren/MARL-CCE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Enhanced Driving Trajectory Prediction Method Based on Generative Adversarial Imitation Learning

M $$^2$$ Sim: A Long-Term Interactive Driving Simulator

Integrating Supervised and Reinforcement Learning for Heterogeneous Traffic Simulation

Notes

1.
$\varDelta ^{\mathcal {S}}$ denotes the probability simplex over the space $\mathcal {S}$.
2.
Throughout this work, the bold symbols (e.g., $\boldsymbol{a}$) indicate a vector of variables while the unbold ones (e.g., a) represent a single variable.

References

Bai, Y., Jin, C., Wang, H., Xiong, C.: Sample-efficient learning of stackelberg equilibria in general-sum games. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 25799–25811 (2021)
Google Scholar
Bai, Y., Jin, C., Yu, T.: Near-optimal reinforcement learning with self-play. In: Advances in Neural Information Processing Systems (NeurIPS) (2020)
Google Scholar
Bergamini, L., et al.: Simnet: learning reactive self-driving simulations from real-world observations. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 5119–5125 (2021)
Google Scholar
Brockfeld, E., Kühne, R.D., Skabardonis, A., Wagner, P.: Toward benchmarking of microscopic traffic flow models. Transp. Res. Rec. 1852, 124–129 (2003)
Article Google Scholar
Cen, S., Chi, Y., Du, S.S., Xiao, L.: Faster last-iterate convergence of policy optimization in zero-sum Markov games. In: International Conference on Learning Representations (ICLR) (2023)
Google Scholar
Chai, Y., Sapp, B., Bansal, M., Anguelov, D.: Multipath: multiple probabilistic anchor trajectory hypotheses for behavior prediction. In: Annual Conference on Robot Learning (CoRL), vol. 100, pp. 86–99 (2019)
Google Scholar
Chen, Y., Ivanovic, B., Pavone, M.: Scept: scene-consistent, policy-based trajectory predictions for planning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17082–17091 (2022)
Google Scholar
Dosovitskiy, A., Ros, G., Codevilla, F., López, A.M., Koltun, V.: Carla: an open urban driving simulator. In: Annual Conference on Robot Learning (CoRL) (2017)
Google Scholar
Feng, L., Li, Q., Peng, Z., Tan, S., Zhou, B.: Trafficgen: learning to generate diverse and realistic traffic scenarios. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 3567–3575 (2023)
Google Scholar
Gulino, C., et al.: Waymax: an accelerated, data-driven simulator for large-scale autonomous driving research. CoRR abs/2310.08710 (2023)
Google Scholar
Haarnoja, T., Tang, H., Abbeel, P., Levine, S.: Reinforcement learning with deep energy-based policies. In: International Conference on Machine Learning (ICML), pp. 1352–1361 (2017)
Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning (ICML), vol. 80, pp. 1856–1865 (2018)
Google Scholar
Hambly, B.M., Xu, R., Yang, H.: Policy gradient methods find the nash equilibrium in n-player general-sum linear-quadratic games. J. Mach. Learn. Res. (JMLR) 24, 139:1–139:56 (2023)
Google Scholar
Hu, J., Wellman, M.P.: Nash Q-learning for general-sum stochastic games. J. Mach. Learn. Res. (JMLR) 4, 1039–1069 (2003)
Google Scholar
Huang, Z., Liu, H., Lv, C.: Gameformer: game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving. In: International Conference on Computer Vision (ICCV), pp. 3880–3890 (2023)
Google Scholar
Hwang, K.S., Chiou, J.Y., Chen, T.Y.: Cooperative reinforcement learning based on zero-sum games. In: SICE Annual Conference, pp. 2973–2976 (2008)
Google Scholar
Jin, C., Liu, Q., Wang, Y., Yu, T.: V-learning-a simple, efficient, decentralized algorithm for multiagent RL. In: International Conference on Learning Representations (ICLR Workshop) (2022)
Google Scholar
Kar, A., et al.: Meta-sim: learning to generate synthetic datasets. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4550–4559 (2019)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: International Conference on Learning Representations (ICLR) (2014)
Google Scholar
Leonardos, S., Overman, W., Panageas, I., Piliouras, G.: Global convergence of multi-agent policy gradient in Markov potential games. In: International Conference on Learning Representations (ICLR) (2022)
Google Scholar
Liang, M., et al.: Learning lane graph representations for motion forecasting. In: European Conference on Computer Vision (ECCV), vol. 12347, pp. 541–556 (2020)
Google Scholar
Lioutas, V., Scibior, A., Wood, F.: Titrated: learned human driving behavior without infractions via amortized inference. Trans. Mach. Learn. Res. (TMLR) (2022)
Google Scholar
Liu, M., Ozdaglar, A.E., Yu, T., Zhang, K.: The power of regularization in solving extensive-form games. In: International Conference on Learning Representations (ICLR) (2023)
Google Scholar
Liu, S., Zhu, M.: Distributed inverse constrained reinforcement learning for multi-agent systems. In: Neural Information Processing Systems (NeurIPS) (2022)
Google Scholar
Liu, S., Zhu, M.: Learning multi-agent behaviors from distributed and streaming demonstrations. In: Neural Information Processing Systems (NeurIPS) (2023)
Google Scholar
Lopez, P.A., et al.: Microscopic traffic simulation using sumo. In: IEEE International Conference on Intelligent Transportation Systems (ITSC), pp. 2575–2582 (2018)
Google Scholar
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 6379–6390 (2017)
Google Scholar
Mao, W., Basar, T.: Provably efficient reinforcement learning in decentralized general-sum Markov games. Dyn. Games Appl. 13(1), 165–186 (2023)
MathSciNet Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Nash, J.: Non-cooperative games. Ann. Math. 286–295 (1951)
Google Scholar
Salzmann, T., Ivanovic, B., Chakravarty, P., Pavone, M.: Trajectron++: dynamically-feasible trajectory forecasting with heterogeneous data. In: European Conference on Computer Vision (ECCV), vol. 12363, pp. 683–700 (2020)
Google Scholar
Schulman, J., Levine, S., Abbeel, P., Jordan, M.I., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning (ICML), vol. 37, pp. 1889–1897 (2015)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017)
Google Scholar
Ścibior, A., Lioutas, V., Reda, D., Bateni, P., Wood, F.: Imagining the road ahead: multi-agent trajectory prediction via differentiable simulation. In: IEEE International Intelligent Transportation Systems Conference (ITSC), pp. 720–725 (2021)
Google Scholar
Sokota, S., et al.: A unified approach to reinforcement learning, quantal response equilibria, and two-player zero-sum games. In: International Conference on Learning Representations (ICLR) (2023)
Google Scholar
Song, Z., Mei, S., Bai, Y.: When can we learn general-sum Markov games with a large number of players sample-efficiently? In: International Conference on Learning Representations (ICLR) (2022)
Google Scholar
Suo, S., Regalado, S., Casas, S., Urtasun, R.: Trafficsim: learning to simulate realistic multi-agent behaviors. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10400–10409 (2021)
Google Scholar
Suo, S., et al.: Mixsim: a hierarchical framework for mixed reality traffic simulation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9622–9631 (2023)
Google Scholar
Tan, S., Wong, K., Wang, S., Manivasagam, S., Ren, M., Urtasun, R.: Scenegen: learning to generate realistic traffic scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 892–901 (2021)
Google Scholar
Wilson, B., et al.: Argoverse 2: next generation datasets for self-driving perception and forecasting. In: Advances in Neural Information Processing Systems Track on Datasets and Benchmarks 1 (NeurIPS) (2021)
Google Scholar
Xu, D., Chen, Y., Ivanovic, B., Pavone, M.: Bits: bi-level imitation for traffic simulation. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 2929–2936. IEEE (2023)
Google Scholar
Yan, X., Zou, Z., Feng, S., Zhu, H., Sun, H., Liu, H.X.: Learning naturalistic driving environment with statistical realism. Nat. Commun. 14(1), 2037 (2023)
Article Google Scholar
Yang, Y., Wang, J.: An overview of multi-agent reinforcement learning from game theoretical perspective. CoRR abs/2011.00583 (2020)
Google Scholar
Yu, C., et al.: The surprising effectiveness of PPO in cooperative multi-agent games. In: Advances in Neural Information Processing Systems (NeurIPS) (2022)
Google Scholar
Yu, Z., Yang, J., Huang, H.H.: Smoothing regression and impact measures for accidents of traffic flows. J. Appl. Stat. 51, 1041–1056 (2023)
Article MathSciNet Google Scholar
Zhang, C., Tu, J., Zhang, L., Wong, K., Suo, S., Urtasun, R.: Learning realistic traffic agents in closed-loop. In: Annual Conference on Robot Learning (CoRL) (2023)
Google Scholar
Zhang, K., Koppel, A., Zhu, H., Basar, T.: Global convergence of policy gradient methods to (almost) locally optimal policies. SIAM J. Control. Optim. 58(6), 3586–3612 (2020)
Article MathSciNet Google Scholar
Zhang, K., Yang, Z., Başar, T.: Multi-agent reinforcement learning: a selective overview of theories and algorithms. In: Handbook of Reinforcement Learning and Control, pp. 321–384 (2021)
Google Scholar
Zhang, Y., Zhang, R., Gu, Y., Li, N.: Multi-agent reinforcement learning with reward delays. In: Learning for Dynamics and Control Conference (L4DC), vol. 211, pp. 692–704 (2023)
Google Scholar
Zhou, Z., Wang, J., Li, Y., Huang, Y.: Query-centric trajectory prediction. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17863–17873 (2023)
Google Scholar
Zhou, Z., Wen, Z., Wang, J., Li, Y., Huang, Y.: Qcnext: a next-generation framework for joint multi-agent trajectory prediction. CoRR abs/2306.10508 (2023)
Google Scholar
Ziebart, B.D., Bagnell, J.A., Dey, A.K.: Modeling interaction via the principle of maximum causal entropy. In: International Conference on Machine Learning (ICML), pp. 1255–1262 (2010)
Google Scholar

Download references

Acknowledgements

This work is supported in part by Shenzhen Fundamental Research Program (General Program) under grant JCYJ20230807114202005, Guangdong-Shenzhen Joint Research Fund under grant 2023A1515110617, Guangdong Basic and Applied Basic Research Foundation under grant 2024A1515012103, Shenzhen Science and Technology Program ZDSYS20211021111415025, and Guangdong Provincial Key Laboratory of Mathematical Foundations for Artificial Intelligence (2023B1212010001).

Author information

Authors and Affiliations

The Chinese University of Hong Kong, Shenzhen, China
Guanren Qiao, Rongxiao Qu & Guiliang Liu
The University of Manchester, Manchester, UK
Guorui Quan

Authors

Guanren Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Guorui Quan
View author publications
You can also search for this author in PubMed Google Scholar
Rongxiao Qu
View author publications
You can also search for this author in PubMed Google Scholar
Guiliang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guiliang Liu .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 2578 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qiao, G., Quan, G., Qu, R., Liu, G. (2025). Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15093. Springer, Cham. https://doi.org/10.1007/978-3-031-72761-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-72761-0_2
Published: 30 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72760-3
Online ISBN: 978-3-031-72761-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model