Abstract
Multi-agent reinforcement learning (MARL) often adopts centralized training with a decentralized execution (CTDE) framework to facilitate cooperation among agents. When it comes to deploying MARL algorithms in real-world scenarios, CTDE requires gradient transmission and parameter synchronization for each training step, which can incur disastrous communication overhead. To enhance communication efficiency, federated MARL is proposed to average the gradients periodically during communication. However, such straightforward averaging leads to poor coordination and slow convergence arising from the non-i.i.d. problem which is evidenced by our theoretical analysis. To address the two challenges, we propose a federated MARL framework, termed cost-efficient federated multi-agent reinforcement learning with learnable aggregation (FMRL-LA). Specifically, we use asynchronous critics to optimize communication efficiency by filtering out redundant local updates based on the estimation of agent utilities. A centralized aggregator rectifies these estimations conditioned on global information to improve cooperation and reduce non-i.i.d. impact by maximizing the composite system objectives. For a comprehensive evaluation, we extend a challenging multi-agent autonomous driving environment to the federated learning paradigm, comparing our method to competitive MARL baselines. Our findings indicate that FMRL-LA can adeptly balance performance and efficiency. Code and appendix can be found in https://github.com/ArronDZhang/FMRL_LA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abegaz, M., Erbad, A., Nahom, H., Albaseer, A., Abdallah, M., Guizani, M.: Multi-agent federated reinforcement learning for resource allocation in UAV-enabled internet of medical things networks. IoT-J (2023)
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM (2018)
Chaudhuri, R., Mukherjee, K., Narayanam, R., Vallam, R.D.: Collaborative reinforcement learning framework to model evolution of cooperation in sequential social dilemmas. In: PAKDD (2021)
Chen, T., Zhang, K., Giannakis, G.B., Başar, T.: Communication-efficient policy gradient methods for distributed reinforcement learning. TCNS (2021)
Christianos, F., Papoudakis, G., Rahman, A., Albrecht, S.V.: Scaling multi-agent reinforcement learning with selective parameter sharing. In: ICML (2021)
Du, X., Wang, J., Chen, S.: Multi-agent meta-reinforcement learning with coordination and reward shaping for traffic signal control. In: PAKDD (2023)
Foerster, J., Assael, I.A., De Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. In: NeurIPS (2016)
Hu, S., Zhu, F., Chang, X., Liang, X.: UPDeT: universal multi-agent reinforcement learning via policy decoupling with transformers. In: ICLR (2021)
Jin, H., Peng, Y., Yang, W., Wang, S., Zhang, Z.: Federated reinforcement learning with environment heterogeneity. In: AISTATS (2022)
Karimireddy, S.P., Kale, S., Mohri, M., Reddi, S., Stich, S., Suresh, A.T.: Scaffold: stochastic controlled averaging for federated learning. In: ICML (2020)
Khodadadian, S., Sharma, P., Joshi, G., Maguluri, S.T.: Federated reinforcement learning: linear speedup under Markovian sampling. In: ICML (2022)
Kuba, J.G., Chen, R., Wen, M., Wen, Y., Sun, F., Wang, J., Yang, Y.: Trust region policy optimisation in multi-agent reinforcement learning. In: ICLR (2022)
Li, Q., Peng, Z., Feng, L., Zhang, Q., Xue, Z., Zhou, B.: MetaDrive: composing diverse driving scenarios for generalizable reinforcement learning. TPAMI (2022)
Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.: Federated optimization in heterogeneous networks. In: MLSys (2020)
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: NeurIPS (2017)
McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: AISTATS (2017)
Mo, J., Xie, H.: A multi-player MAB approach for distributed selection problems. In: PAKDD (2023)
Pang, Y., Zhang, H., Deng, J.D., Peng, L., Teng, F.: Rule-based collaborative learning with heterogeneous local learning models. In: PAKDD (2022)
Peng, Z., Hui, K.M., Liu, C., Zhou, B.: Learning to simulate self-driven particles system with coordinated policy optimization. In: NeurIPS (2021)
Pinto Neto, E.C., Sadeghi, S., Zhang, X., Dadkhah, S.: Federated reinforcement learning in IoT: applications, opportunities and open challenges. Appl. Sci. (2023)
Rashid, T., Samvelyan, M., De Witt, C.S., Farquhar, G., Foerster, J., Whiteson, S.: Monotonic value function factorisation for deep multi-agent reinforcement learning. JMLR (2020)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)
Song, Y., Chang, H.H., Liu, L.: Federated dynamic spectrum access through multi-agent deep reinforcement learning. In: GLOBECOM (2022)
Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning. arXiv:1706.05296 (2017)
Wang, J., Joshi, G.: Cooperative SGD: a unified framework for the design and analysis of local-update SGD algorithms. JMLR (2021)
Wang, J., Liu, Q., Liang, H., Joshi, G., Poor, H.V.: Tackling the objective inconsistency problem in heterogeneous federated optimization. In: NeurIPS (2020)
Wen, M., et al.: Multi-agent reinforcement learning is a sequence modeling problem. Front. Comput. Sci. (2022)
de Witt, C.S., et al.: Is independent learning all you need in the starcraft multi-agent challenge? arXiv:2011.09533 (2020)
Xu, X., Li, R., Zhao, Z., Zhang, H.: The gradient convergence bound of federated multi-agent reinforcement learning with efficient communication. TWC (2023)
Yu, C., Velu, A., Vinitsky, E., Gao, J., Wang, Y., Bayen, A., Wu, Y.: The surprising effectiveness of PPO in cooperative multi-agent games. In: NeurIPS (2022)
Zhou, X., Matsubara, S., Liu, Y., Liu, Q.: Bribery in rating systems: a game-theoretic perspective. In: PAKDD (2022)
Acknowledgements
This work is supported by project DE200101610 funded by Australian Research Council and CSIRO’s Science Leader project R-91559.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, Y., Wang, S., Chen, Z., Xu, X., Funiak, S., Liu, J. (2024). Towards Cost-Efficient Federated Multi-agent RL with Learnable Aggregation. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14646. Springer, Singapore. https://doi.org/10.1007/978-981-97-2253-2_14
Download citation
DOI: https://doi.org/10.1007/978-981-97-2253-2_14
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2252-5
Online ISBN: 978-981-97-2253-2
eBook Packages: Computer ScienceComputer Science (R0)