Abstract
Multi-agent reinforcement learning algorithms depend on quantities of interactions with the environment and other agents to derive an approximately optimal policy. However, these algorithms may struggle in the complex interactive relationships between agents and tend to explore the whole observation space aimlessly, which results in high learning complexity. Motivated by the occasional and local interactions between multiple agents in most real-world scenarios, in this paper, we propose a general framework named Discrepancy-Driven Multi-Agent reinforcement learning (DDMA) to help overcome this limitation. In this framework, we first parse the semantic components of each agent’s observation and introduce a proliferative network to directly initialize the multi-agent policy with the corresponding single-agent optimal policy, which bypasses the misalignment of observation spaces in different scenarios. Then we model the occasional interactions among agents based on the discrepancy between these two policies, and conduct more focused exploration on these areas where agents interact frequently. With the direct initialization and the focused multi-agent policy learning, our framework can help accelerate the learning process and promote the asymptotic performance significantly. Experimental results on a toy example and several classic benchmarks demonstrate that our framework can obtain superior performance compared to baseline methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Please refer to https://github.com/chaobiubiu/DDMA for more details.
References
Cao, Y., Yu, W., Ren, W., Chen, G.: An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans. Industr. Inf. 9(1), 427–438 (2012)
Christianos, F., Schäfer, L., Albrecht, S.: Shared experience actor-critic for multi-agent reinforcement learning, vol. 33, pp. 10707–10717 (2020)
Da Silva, F.L., Costa, A.H.R.: A survey on transfer learning for multiagent reinforcement learning systems. J. Artif. Intell. Res. 64, 645–703 (2019)
De Hauwere, Y.M., Vrancx, P., Nowé, A.: Learning multi-agent state space representations. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, vol. 1, pp. 715–722 (2010)
Diuk, C., Cohen, A., Littman, M.L.: An object-oriented representation for efficient reinforcement learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 240–247 (2008)
Farquhar, G., Gustafson, L., Lin, Z., Whiteson, S., Usunier, N., Synnaeve, G.: Growing action spaces. In: International Conference on Machine Learning. PMLR, pp. 3040–3051 (2020)
Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Fortunato, M., et al.: Noisy networks for exploration. arXiv preprint arXiv:1706.10295 (2017)
Goodfellow, I.J., et al.: Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, vol. 2, pp. 2672–2680 (2014)
Huang, Y., Wu, S., Mu, Z., Long, X., Chu, S., Zhao, G.: A multi-agent reinforcement learning method for swarm robots in space collaborative exploration. In: 2020 6th International Conference on Control, Automation and Robotics (ICCAR), pp. 139–144. IEEE (2020)
Iqbal, S., Sha, F.: Actor-attention-critic for multi-agent reinforcement learning. In: International Conference on Machine Learning. PMLR, pp. 2961–2970 (2019)
Liu, Y., Hu, Y., Gao, Y., Chen, Y., Fan, C.: Value function transfer for deep multi-agent reinforcement learning based on n-step returns. In: IJCAI, pp. 457–463 (2019)
Liu, Y., Wang, W., Hu, Y., Hao, J., Chen, X., Gao, Y.: Multi-agent game abstraction via graph attention neural network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 7211–7218 (2020)
Long, Q., Zhou, Z., Gupta, A., Fang, F., Wu, Y., Wang, X.: Evolutionary population curriculum for scaling multi-agent reinforcement learning. arXiv preprint arXiv:2003.10423 (2020)
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv preprint arXiv:1706.02275 (2017)
Mordatch, I., Abbeel, P.: Emergence of grounded compositional language in multi-agent populations. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Narvekar, S., Peng, B., Leonetti, M., Sinapov, J., Taylor, M.E., Stone, P.: Curriculum learning for reinforcement learning domains: a framework and survey. J. Mach. Learn. Res. 21, 1–50 (2020)
Omidshafiei, S., et al.: Learning to teach in cooperative multiagent reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6128–6136 (2019)
Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning. PMLR, pp. 4295–4304 (2018)
Samvelyan, M., et al.: The starcraft multi-agent challenge. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 2186–2188 (2019)
Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International Conference on Machine Learning. PMLR, pp. 5887–5896 (2019)
Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296 (2017)
Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1057–1063 (2000)
Vezhnevets, A., Wu, Y., Eckstein, M., Leblond, R., Leibo, J.Z.: Options as responses: grounding behavioural hierarchies in multi-agent reinforcement learning. In: International Conference on Machine Learning. PMLR, pp. 9733–9742 (2020)
Wang, J., Ren, Z., Liu, T., Yu, Y., Zhang, C.: QPLEX: duplex dueling multi-agent Q-learning. arXiv preprint arXiv:2008.01062 (2020)
Wang, T., Gupta, T., Mahajan, A., Peng, B., Whiteson, S., Zhang, C.: RODE: learning roles to decompose multi-agent tasks. arXiv preprint arXiv:2010.01523 (2020)
Wang, W., et al.: From few to more: large-scale dynamic multiagent curriculum learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 7293–7300 (2020)
Yang, T., et al.: An efficient transfer learning framework for multiagent reinforcement learning, vol. 34 (2021)
Yang, Y., Luo, R., Li, M., Zhou, M., Zhang, W., Wang, J.: Mean field multi-agent reinforcement learning. In: International Conference on Machine Learning. PMLR, pp. 5571–5580 (2018)
Acknowledgements
This work is supported by Shenzhen Fundamental Research Program (No.2021Szvup056), Primary Research & Development Plan of Jiangsu Province (No. BE2021028), National Natural Science Foundation of China (No. 62192783), and Science and Technology Innovation 2030 New Generation Artificial Intelligence Major Project (No.2018AAA0100905).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, C., Hu, Y., Tian, P., Dong, S., Gao, Y. (2022). DDMA: Discrepancy-Driven Multi-agent Reinforcement Learning. In: Khanna, S., Cao, J., Bai, Q., Xu, G. (eds) PRICAI 2022: Trends in Artificial Intelligence. PRICAI 2022. Lecture Notes in Computer Science, vol 13631. Springer, Cham. https://doi.org/10.1007/978-3-031-20868-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-20868-3_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20867-6
Online ISBN: 978-3-031-20868-3
eBook Packages: Computer ScienceComputer Science (R0)