Abstract
Multi-Agent Reinforcement Learning (MARL) promises to address the challenges of cooperation and competition among multiple agents, often involving safety-critical scenarios. However, realizing safe MARL remains a domain of limited progress. Current works extend single-agent safe learning approaches, employing shielding or backup policies to ensure safety satisfaction. Nevertheless, these approaches require good cooperation among multiple agents, and weakly distributed approaches with centralized shielding become infeasible when agents encounter complex situations such as non-cooperative agents and coordination failures. In this paper, we integrate the Hamilton-Jacobi (HJ) reachability theory and present a Centralized Training and Decentralized Execution (CTDE) framework for Safe MARL. Our framework enables the learning of safety policies without the need for system model or shielding layer pre-training. Additionally, we enhance adaptability to varying levels of cooperation through a conservative approximation estimation of the value function. Experimental results validate the efficacy of our proposed method, demonstrating its ability to ensure safety while successfully achieving target tasks under cooperative conditions. Furthermore, our approach exhibits robustness in the face of non-cooperative behaviors induced by complex disturbance factors.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Code Availability
The code used in this paper is available from the corresponding author upon reasonable request.
References
Gronauer, S., Diepold, K.: Multi-agent deep reinforcement learning: a survey. Artif. Intell. Rev., 1–49 (2022)
Altman, E.: Constrained Markov Decision Processes vol. 7, (1999)
Brunke, L., Greeff, M., Hall, A.W., Yuan, Z., Zhou, S., Panerati, J., Schoellig, A.P.: Safe learning in robotics: From learning-based control to safe reinforcement learning. Ann. Rev. Control Robot. Auton. Syst. 5, 411–444 (2022)
Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Dalal, G., Dvijotham, K., Vecerik, M., Hester, T., Paduraru, C., Tassa, Y.: Safe Exploration in Continuous Action Spaces. 1801–08757 (2018) https://doi.org/10.48550/arXiv.1801.08757arXiv:1801.08757 [cs.AI]
Sheebaelhamd, Z., Zisis, K., Nisioti, A., Gkouletsos, D., Pavllo, D., Kohler, J.: Safe Deep Reinforcement Learning for Multi-Agent Systems with Continuous Action Spaces. 2108–03952 (2021) https://doi.org/10.48550/arXiv.2108.03952arXiv:2108.03952 [cs.LG]
Lowe, R., WU, Y., Tamar, A., Harb, J., Pieter Abbeel, O., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. Adv. Neural Inf. Process. Syst., 30 (2017)
Mitchell, I.M., Bayen, A.M., Tomlin, C.J.: A time-dependent hamilton-jacobi formulation of reachable sets for continuous dynamic games. IEEE Trans. Autom. Control 50(7), 947–957 (2005). https://doi.org/10.1109/TAC.2005.851439
Munos, R., Baird, L.C., Moore, A.W.: Gradient descent approaches to neural-net-based solutions of the hamilton-jacobi-bellman equation. In: IJCNN’99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339), vol. 3, pp. 2152–21573 (1999). https://doi.org/10.1109/IJCNN.1999.832721
Fisac, J.F., Lugovoy, N.F., Rubies-Royo, V., Ghosh, S., Tomlin, C.J.: Bridging hamilton-jacobi safety analysis and reinforcement learning. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 8550–8556 (2019). https://doi.org/10.1109/ICRA.2019.8794107
Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W.M., Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J.Z., Tuyls, K., Graepel, T.: Value-decomposition networks for cooperative multi-agent learning based on team reward. In: 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS) (AAMAS’ 18), pp. 2085–2087 (2018)
Rashid, T., Samvelyan, M., Witt, C.S., Farquhar, G., Foerster, J., Whiteson, S.: Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: 35th International Conference on Machine Learning (ICML). Proceedings of Machine Learning Research, vol. 80 (2018)
Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 5887–5896 (2019). PMLR
Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Yang, J., Nakhaei, A., Isele, D., Fujimura, K., Zha, H.: Cm3: Cooperative multi-goal multi-stage multi-agent reinforcement learning. In: International Conference on Learning Representations (2020)
Yu, C., Velu, A., Vinitsky, E., Gao, J., Wang, Y., Bayen, A., Wu, Y.: The surprising effectiveness of ppo in cooperative multi-agent games. Adv. Neural Inf. Process. Syst. 35, 24611–24624 (2022)
Kuba, J., Chen, R., Wen, M., Wen, Y., Sun, F., Wang, J., Yang, Y.: Trust region policy optimisation in multi-agent reinforcement learning. In: ICLR 2022-10th International Conference on Learning Representations, pp. 1046 (2022). The International Conference on Learning Representations (ICLR)
Amhraoui, E., Masrour, T.: Smooth q-learning: An algorithm for independent learners in stochastic cooperative markov games. J. Intell. Robot. Syst. 108(4), 65 (2023)
Wen, M., Kuba, J., Lin, R., Zhang, W., Wen, Y., Wang, J., Yang, Y.: Multi-agent reinforcement learning is a sequence modeling problem. Adv. Neural Inf. Process. Syst. 35, 16509–16521 (2022)
Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)
Gu, S., Yang, L., Du, Y., Chen, G., Walter, F., Wang, J., Yang, Y., Knoll, A.: A review of safe reinforcement learning: Methods, theory and applications. arXiv:2205.10330 (2022)
Chow, Y., Ghavamzadeh, M., Janson, L., Pavone, M.: Risk-constrained reinforcement learning with percentile risk criteria. J. Mach. Learn. Res. 18(167), 1–51 (2018)
Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: International Conference on Machine Learning, pp. 22–31 (2017). PMLR
Yang, T.-Y., Rosca, J., Narasimhan, K., Ramadge, P.J.: Projection-based constrained policy optimization. In: International Conference on Learning Representations (2019)
T.-Y. Yang, J. Rosca, K. Narasimhan, and P. J. Ramadge: Accelerating safe reinforcement learning with constraint-mismatched baseline policies. In: International Conference on Machine Learning, pp. 11795–11807 (2021). PMLR
Gu, S., Kuba, J.G., Chen, Y., Du, Y., Yang, L., Knoll, A., Yang, Y.: Safe multi-agent reinforcement learning for multi-robot control. Artif. Intell. 319, 103905 (2023)
Ziyan, W., Yali, D., Aivar, S., Haitham Bou, A., Jun, W.: Cama : A new framework for safe multi-agent reinforcement learning using constraint augmentation. (2023)
Sootla, A., Cowen-Rivers, A.I., Jafferjee, T., Wang, Z., Mguni, D.H., Wang, J., Ammar, H.: Sauté rl: Almost surely safe reinforcement learning using state augmentation. In: International Conference on Machine Learning, pp. 20423–20443 (2022). PMLR
Zhao, W., He, T., Chen, R., Wei, T., Liu, C.: State-wise safe reinforcement learning: A survey. arXiv:2302.03122. (2023)
ElSayed-Aly, I., Bharadwaj, S., Amato, C., Ehlers, R., Topcu, U., Feng, L.: Safe multi-agent reinforcement learning via shielding, 483–491 (2021)
Li, S., Wu, Y., Cui, X., Dong, H., Fang, F., Russell, S.: Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4213–4220 (2019)
Bansal, S., Chen, M., Herbert, S., Tomlin, C.J.: Hamilton-jacobi reachability: A brief overview and recent advances. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 2242–2253 (2017). IEEE
Chen, M., Tomlin, C.J.: Hamilton-jacobi reachability: Some recent theoretical advances and applications in unmanned airspace management. Ann. Rev. Control Robot. Auton. Syst. 1, 333–358 (2018)
Shao, Y.S., Chen, C., Kousik, S., Vasudevan, R.: Reachability-based trajectory safeguard (rts): A safe and fast reinforcement learning safety layer for continuous control. IEEE Robot. Autom. Lett. 6(2), 3663–3670 (2021)
Kochdumper, N., Krasowski, H., Wang, X., Bak, S., Althoff, M.: Provably safe reinforcement learning via action projection using reachability analysis and polynomial zonotopes. IEEE Open J. Control Syst. 2, 79–92 (2023)
Selim, M., Alanwar, A., Kousik, S., Gao, G., Pavone, M., Johansson, K.H.: Safe reinforcement learning using black-box reachability analysis. IEEE Robot. Autom. Lett. 7(4), 10665–10672 (2022)
Hsu, K.-C., Rubies-Royo, V., Tomlin, C.J., Fisac, J.F.: Safety and liveness guarantees through reach-avoid reinforcement learning. In: Proceedings of Robotics: Science and Systems, Held Virtually (2021). https://doi.org/10.15607/RSS.2021.XVII.077
Yu, D., Ma, H., Li, S., Chen, J.: Reachability constrained reinforcement learning. In: International Conference on Machine Learning, pp. 25636–25655 (2022). PMLR
Ganai, M., Gong, Z., Yu, C., Herbert, S., Gao, S.: Iterative reachability estimation for safe reinforcement learning. Adv. Neural Inf. Process. Syst. 36 (2024)
Bardi, M., Falcone, M., Soravia, P.: Numerical methods for pursuit-evasion games via viscosity solutions. In: Stochastic and Differential Games: Theory and Numerical Methods, pp. 105–175 (1999)
Munos, R., Baird, L.C., Moore, A.W.: Gradient descent approaches to neural-net-based solutions of the hamilton-jacobi-bellman equation. In: IJCNN’99. International Joint Conference on Neural Networks. Proceedings (Cat. No. 99CH36339), vol. 3, pp. 2152–2157 (1999). IEEE
Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative q-learning for offline reinforcement learning. Adv. Neural Inf. Process. Systems. 33, 1179–1191 (2020)
Bharadhwaj, H., Kumar, A., Rhinehart, N., Levine, S., Shkurti, F., Garg, A.: Conservative safety critics for exploration. In: International Conference on Learning Representations (2021)
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. Robotica. 17(2), 229–235 (1999)
Mordatch, I., Abbeel, P.: Emergence of grounded compositional language in multi-agent populations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Acknowledgements
We would like to thank the support of Scientific and Technological Innovation 2030 under Grant 2021ZD0110900.
Funding
This research was funded by Scientific and Technological Innovation 2030 under Grant 2021ZD0110900.
Author information
Authors and Affiliations
Contributions
Kai Zhu contributed to the design, implementation and manuscript writing of the research; Fengbo Lan and Wenbo Zhao contributed to the discussion of the experiment and the revision of the manuscript; Tao Zhang supervised the work and has reviewed, and edited the final version of the paper.
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflicts of interest.
Consent to participate
All authors of this research paper have consented to participate in the research study.
Consent for publication
All authors of this research paper have read and approved the submitted version.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhu, K., Lan, F., Zhao, W. et al. Safe Multi-Agent Reinforcement Learning via Approximate Hamilton-Jacobi Reachability. J Intell Robot Syst 111, 7 (2025). https://doi.org/10.1007/s10846-024-02156-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10846-024-02156-6