Abstract
With the development of deep reinforcement learning and multi-agent modeling, Multi-Agent Reinforcement Learning (MARL) has become a very active research topic recently. Q-Mix is a popular algorithm for solving MARL tasks where the individual agents are allowed to be trained in a centralized manner. As the scale and complexity of MARL tasks grow, there is an urging requirement for a more efficient training strategy. As a consequence, it is demanding to develop a Q-Mix training algorithm which can benefit from parallel computation. However, how classic distributed machine learning frameworks work with Q-Mix is a less studied problem. In this paper, we propose the PS-Qmix algorithm to apply the Parameter Server framework to training QMix agents in parallel. Our algorithm employs multiple distributed worker threads for data generation and model learning, where these two processes are decoupled and executed in alternation. To cater for different simulation speed of the environment, the proposed algorithm allows the user to tune the relative proportion of computation allocated to data generation and model learning. We evaluate the PS-Qmix algorithm on a StarCraft II micro-combat task. As we increase the number of worker threads, we observe significant speed-up in both data generation and model learning. The evaluation results indicate that our method is effective in utilizing distributed computation resources to train Q-Mix agents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Assran, M., Romoff, J., Ballas, N., Pineau, J., Rabbat, M.: Gossip-based actor-learner architectures for deep reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 32, pp. 13320–13330 (2019)
Babaeizadeh, M., Frosio, I., Tyree, S., Clemons, J., Kautz, J.: GA3C: GPU-based A3C for deep reinforcement learning (2016)
Bansal, T., Pachocki, J., Sidor, S., Sutskever, I., Mordatch, I.: Emergent complexity via multi-agent competition. In: International Conference on Learning Representations (2017)
Busoniu, L., Babuska, R., Schutter, B.D.: A comprehensive survey of multiagent reinforcement learning. Syst. Man Cybern. 38(2), 156–172 (2008)
Cao, Y., Yu, W., Ren, W., Chen, G.: An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans. Industr. Inf. 9(1), 427–438 (2013)
Chang, Y.H., Ho, T., Kaelbling, L.P.: All learning is local: multi-agent learning in global reward games. In: Advances in Neural Information Processing Systems 16, vol. 16, pp. 807–814 (2003)
Espeholt, L., Marinier, R., Stanczyk, P., Wang, K., Michalski, M.: SEED RL: scalable and efficient Deep-RL with accelerated central inference. In: ICLR 2020: Eighth International Conference on Learning Representations (2020)
Espeholt, L., et al.: IMPALA: scalable distributed Deep-RL with importance weighted actor-learner architectures. In: International Conference on Machine Learning, pp. 1406–1415 (2018)
Foerster, J.N., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: AAAI, pp. 2974–2982 (2018)
Grounds, M., Kudenko, D.: Parallel reinforcement learning with linear function approximation. In: Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, p. 45 (2007)
Ha, D., Dai, A., Le, Q.V.: Hypernetworks (2016)
Horgan, D., et al.: Distributed prioritized experience replay. In: International Conference on Learning Representations (2018)
Hüttenrauch, M., Sosic, A., Neumann, G.: Guided deep reinforcement learning for swarm systems. CoRR abs/1709.06011 (2017). http://arxiv.org/abs/1709.06011
Li, Y., Schuurmans, D.: MapReduce for parallel reinforcement learning. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS (LNAI), vol. 7188, pp. 309–320. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29946-9_30
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: ICLR 2016: International Conference on Learning Representations 2016 (2016)
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O.P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, vol. 30, pp. 6379–6390 (2017)
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: ICML 2016 Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, pp. 1928–1937 (2016)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
MoravcÃk, M., et al.: DeepStack: expert-level artificial intelligence in no-limit poker. CoRR abs/1701.01724 (2017). http://arxiv.org/abs/1701.01724
Moritz, P., et al.: Ray: a distributed framework for emerging AI applications. In: OSDI 2018 Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, pp. 561–577 (2018)
Nair, A., et al.: Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296 (2015)
Pesce, E., Montana, G.: Improving coordination in multi-agent deep reinforcement learning through memory-driven communication (2019)
Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 4292–4301 (2018)
Samvelyan, M., et al.: The StarCraft multi-agent challenge. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 2186–2188 (2019)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Stone, P., Veloso, M.: Multiagent systems: a survey from a machine learning perspective. Auton. Robot. 8(3), 345–383 (2000)
Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 2085–2087 (2018)
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction (1988)
Tampuu, A., et al.: Multiagent cooperation and competition with deep reinforcement learning. PLOS ONE 12(4), e0172395 (2017)
Tan, M.: Multi-agent reinforcement learning: independent vs. cooperative agents. In: ICML 1993 Proceedings of the Tenth International Conference on International Conference on Machine Learning, pp. 487–494 (1997)
Tsitsiklis, J.N.: Asynchronous stochastic approximation and q-learning. Mach. Learn. 16(3), 185–202 (1994)
Vinyals, O., et al.: Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
Ying, W., Dayong, S.: Multi-agent framework for third party logistics in e-commerce. Expert Syst. Appl. 29(2), 431–436 (2005)
Acknowledgement
This work is supported by the National Natural Science Foundation of China (61902425).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, X., Li, X., Li, Y., Xiao, B. (2022). PS-QMix: A Parallel Learning Framework for Q-Mix Using Parameter Server. In: Li, B., et al. Advanced Data Mining and Applications. ADMA 2022. Lecture Notes in Computer Science(), vol 13087. Springer, Cham. https://doi.org/10.1007/978-3-030-95405-5_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-95405-5_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95404-8
Online ISBN: 978-3-030-95405-5
eBook Packages: Computer ScienceComputer Science (R0)