PS-QMix: A Parallel Learning Framework for Q-Mix Using Parameter Server

Liu, Xunyun; Li, Xiang; Li, Yuan; Xiao, Boren

doi:10.1007/978-3-030-95405-5_24

Xunyun Liu¹⁶,
Xiang Li¹⁶,
Yuan Li¹⁶ &
…
Boren Xiao¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13087))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

1316 Accesses

Abstract

With the development of deep reinforcement learning and multi-agent modeling, Multi-Agent Reinforcement Learning (MARL) has become a very active research topic recently. Q-Mix is a popular algorithm for solving MARL tasks where the individual agents are allowed to be trained in a centralized manner. As the scale and complexity of MARL tasks grow, there is an urging requirement for a more efficient training strategy. As a consequence, it is demanding to develop a Q-Mix training algorithm which can benefit from parallel computation. However, how classic distributed machine learning frameworks work with Q-Mix is a less studied problem. In this paper, we propose the PS-Qmix algorithm to apply the Parameter Server framework to training QMix agents in parallel. Our algorithm employs multiple distributed worker threads for data generation and model learning, where these two processes are decoupled and executed in alternation. To cater for different simulation speed of the environment, the proposed algorithm allows the user to tune the relative proportion of computation allocated to data generation and model learning. We evaluate the PS-Qmix algorithm on a StarCraft II micro-combat task. As we increase the number of worker threads, we observe significant speed-up in both data generation and model learning. The evaluation results indicate that our method is effective in utilizing distributed computation resources to train Q-Mix agents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Learning Distinct Strategies for Heterogeneous Cooperative Multi-agent Reinforcement Learning

SADMA: Scalable Asynchronous Distributed Multi-agent Reinforcement Learning Training Framework

MTMA-DDPG: A Deep Deterministic Policy Gradient Reinforcement Learning for Multi-task Multi-agent Environments

References

Assran, M., Romoff, J., Ballas, N., Pineau, J., Rabbat, M.: Gossip-based actor-learner architectures for deep reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 32, pp. 13320–13330 (2019)
Google Scholar
Babaeizadeh, M., Frosio, I., Tyree, S., Clemons, J., Kautz, J.: GA3C: GPU-based A3C for deep reinforcement learning (2016)
Google Scholar
Bansal, T., Pachocki, J., Sidor, S., Sutskever, I., Mordatch, I.: Emergent complexity via multi-agent competition. In: International Conference on Learning Representations (2017)
Google Scholar
Busoniu, L., Babuska, R., Schutter, B.D.: A comprehensive survey of multiagent reinforcement learning. Syst. Man Cybern. 38(2), 156–172 (2008)
Article Google Scholar
Cao, Y., Yu, W., Ren, W., Chen, G.: An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans. Industr. Inf. 9(1), 427–438 (2013)
Article Google Scholar
Chang, Y.H., Ho, T., Kaelbling, L.P.: All learning is local: multi-agent learning in global reward games. In: Advances in Neural Information Processing Systems 16, vol. 16, pp. 807–814 (2003)
Google Scholar
Espeholt, L., Marinier, R., Stanczyk, P., Wang, K., Michalski, M.: SEED RL: scalable and efficient Deep-RL with accelerated central inference. In: ICLR 2020: Eighth International Conference on Learning Representations (2020)
Google Scholar
Espeholt, L., et al.: IMPALA: scalable distributed Deep-RL with importance weighted actor-learner architectures. In: International Conference on Machine Learning, pp. 1406–1415 (2018)
Google Scholar
Foerster, J.N., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: AAAI, pp. 2974–2982 (2018)
Google Scholar
Grounds, M., Kudenko, D.: Parallel reinforcement learning with linear function approximation. In: Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, p. 45 (2007)
Google Scholar
Ha, D., Dai, A., Le, Q.V.: Hypernetworks (2016)
Google Scholar
Horgan, D., et al.: Distributed prioritized experience replay. In: International Conference on Learning Representations (2018)
Google Scholar
Hüttenrauch, M., Sosic, A., Neumann, G.: Guided deep reinforcement learning for swarm systems. CoRR abs/1709.06011 (2017). http://arxiv.org/abs/1709.06011
Li, Y., Schuurmans, D.: MapReduce for parallel reinforcement learning. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS (LNAI), vol. 7188, pp. 309–320. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29946-9_30
Chapter Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: ICLR 2016: International Conference on Learning Representations 2016 (2016)
Google Scholar
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O.P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, vol. 30, pp. 6379–6390 (2017)
Google Scholar
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: ICML 2016 Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, pp. 1928–1937 (2016)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Moravcík, M., et al.: DeepStack: expert-level artificial intelligence in no-limit poker. CoRR abs/1701.01724 (2017). http://arxiv.org/abs/1701.01724
Moritz, P., et al.: Ray: a distributed framework for emerging AI applications. In: OSDI 2018 Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, pp. 561–577 (2018)
Google Scholar
Nair, A., et al.: Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296 (2015)
Pesce, E., Montana, G.: Improving coordination in multi-agent deep reinforcement learning through memory-driven communication (2019)
Google Scholar
Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 4292–4301 (2018)
Google Scholar
Samvelyan, M., et al.: The StarCraft multi-agent challenge. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 2186–2188 (2019)
Google Scholar
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Article Google Scholar
Stone, P., Veloso, M.: Multiagent systems: a survey from a machine learning perspective. Auton. Robot. 8(3), 345–383 (2000)
Article Google Scholar
Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 2085–2087 (2018)
Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction (1988)
Google Scholar
Tampuu, A., et al.: Multiagent cooperation and competition with deep reinforcement learning. PLOS ONE 12(4), e0172395 (2017)
Article Google Scholar
Tan, M.: Multi-agent reinforcement learning: independent vs. cooperative agents. In: ICML 1993 Proceedings of the Tenth International Conference on International Conference on Machine Learning, pp. 487–494 (1997)
Google Scholar
Tsitsiklis, J.N.: Asynchronous stochastic approximation and q-learning. Mach. Learn. 16(3), 185–202 (1994)
MATH Google Scholar
Vinyals, O., et al.: Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
Article Google Scholar
Ying, W., Dayong, S.: Multi-agent framework for third party logistics in e-commerce. Expert Syst. Appl. 29(2), 431–436 (2005)
Article Google Scholar

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (61902425).

Author information

Authors and Affiliations

Academy of Military Science, Beijing, 100091, China
Xunyun Liu, Xiang Li, Yuan Li & Boren Xiao

Authors

Xunyun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Boren Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiang Li .

Editor information

Editors and Affiliations

Nanjing University of Aeronautics and Astronautics, Nanjing, China
Bohan Li
University of Queensland, Brisbane, QLD, Australia
Lin Yue
University of Technology Sydney, Sydney, NSW, Australia
Jing Jiang
University of Queensland, Brisbane, QLD, Australia
Weitong Chen
University of Queensland, Brisbane, QLD, Australia
Xue Li
University of Technology Sydney, Sydney, NSW, Australia
Guodong Long
Carnegie Mellon University, Pittsburgh, USA
Fei Fang
Nanyang Technological University, Singapore, Singapore
Han Yu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, X., Li, X., Li, Y., Xiao, B. (2022). PS-QMix: A Parallel Learning Framework for Q-Mix Using Parameter Server. In: Li, B., et al. Advanced Data Mining and Applications. ADMA 2022. Lecture Notes in Computer Science(), vol 13087. Springer, Cham. https://doi.org/10.1007/978-3-030-95405-5_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-95405-5_24
Published: 31 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95404-8
Online ISBN: 978-3-030-95405-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics