Skip to main content

PS-QMix: A Parallel Learning Framework for Q-Mix Using Parameter Server

  • Conference paper
  • First Online:
Advanced Data Mining and Applications (ADMA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13087))

Included in the following conference series:

  • 1316 Accesses

Abstract

With the development of deep reinforcement learning and multi-agent modeling, Multi-Agent Reinforcement Learning (MARL) has become a very active research topic recently. Q-Mix is a popular algorithm for solving MARL tasks where the individual agents are allowed to be trained in a centralized manner. As the scale and complexity of MARL tasks grow, there is an urging requirement for a more efficient training strategy. As a consequence, it is demanding to develop a Q-Mix training algorithm which can benefit from parallel computation. However, how classic distributed machine learning frameworks work with Q-Mix is a less studied problem. In this paper, we propose the PS-Qmix algorithm to apply the Parameter Server framework to training QMix agents in parallel. Our algorithm employs multiple distributed worker threads for data generation and model learning, where these two processes are decoupled and executed in alternation. To cater for different simulation speed of the environment, the proposed algorithm allows the user to tune the relative proportion of computation allocated to data generation and model learning. We evaluate the PS-Qmix algorithm on a StarCraft II micro-combat task. As we increase the number of worker threads, we observe significant speed-up in both data generation and model learning. The evaluation results indicate that our method is effective in utilizing distributed computation resources to train Q-Mix agents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Assran, M., Romoff, J., Ballas, N., Pineau, J., Rabbat, M.: Gossip-based actor-learner architectures for deep reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 32, pp. 13320–13330 (2019)

    Google Scholar 

  2. Babaeizadeh, M., Frosio, I., Tyree, S., Clemons, J., Kautz, J.: GA3C: GPU-based A3C for deep reinforcement learning (2016)

    Google Scholar 

  3. Bansal, T., Pachocki, J., Sidor, S., Sutskever, I., Mordatch, I.: Emergent complexity via multi-agent competition. In: International Conference on Learning Representations (2017)

    Google Scholar 

  4. Busoniu, L., Babuska, R., Schutter, B.D.: A comprehensive survey of multiagent reinforcement learning. Syst. Man Cybern. 38(2), 156–172 (2008)

    Article  Google Scholar 

  5. Cao, Y., Yu, W., Ren, W., Chen, G.: An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans. Industr. Inf. 9(1), 427–438 (2013)

    Article  Google Scholar 

  6. Chang, Y.H., Ho, T., Kaelbling, L.P.: All learning is local: multi-agent learning in global reward games. In: Advances in Neural Information Processing Systems 16, vol. 16, pp. 807–814 (2003)

    Google Scholar 

  7. Espeholt, L., Marinier, R., Stanczyk, P., Wang, K., Michalski, M.: SEED RL: scalable and efficient Deep-RL with accelerated central inference. In: ICLR 2020: Eighth International Conference on Learning Representations (2020)

    Google Scholar 

  8. Espeholt, L., et al.: IMPALA: scalable distributed Deep-RL with importance weighted actor-learner architectures. In: International Conference on Machine Learning, pp. 1406–1415 (2018)

    Google Scholar 

  9. Foerster, J.N., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: AAAI, pp. 2974–2982 (2018)

    Google Scholar 

  10. Grounds, M., Kudenko, D.: Parallel reinforcement learning with linear function approximation. In: Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, p. 45 (2007)

    Google Scholar 

  11. Ha, D., Dai, A., Le, Q.V.: Hypernetworks (2016)

    Google Scholar 

  12. Horgan, D., et al.: Distributed prioritized experience replay. In: International Conference on Learning Representations (2018)

    Google Scholar 

  13. Hüttenrauch, M., Sosic, A., Neumann, G.: Guided deep reinforcement learning for swarm systems. CoRR abs/1709.06011 (2017). http://arxiv.org/abs/1709.06011

  14. Li, Y., Schuurmans, D.: MapReduce for parallel reinforcement learning. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS (LNAI), vol. 7188, pp. 309–320. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29946-9_30

    Chapter  Google Scholar 

  15. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: ICLR 2016: International Conference on Learning Representations 2016 (2016)

    Google Scholar 

  16. Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O.P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, vol. 30, pp. 6379–6390 (2017)

    Google Scholar 

  17. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: ICML 2016 Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, pp. 1928–1937 (2016)

    Google Scholar 

  18. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  19. Moravcík, M., et al.: DeepStack: expert-level artificial intelligence in no-limit poker. CoRR abs/1701.01724 (2017). http://arxiv.org/abs/1701.01724

  20. Moritz, P., et al.: Ray: a distributed framework for emerging AI applications. In: OSDI 2018 Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, pp. 561–577 (2018)

    Google Scholar 

  21. Nair, A., et al.: Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296 (2015)

  22. Pesce, E., Montana, G.: Improving coordination in multi-agent deep reinforcement learning through memory-driven communication (2019)

    Google Scholar 

  23. Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 4292–4301 (2018)

    Google Scholar 

  24. Samvelyan, M., et al.: The StarCraft multi-agent challenge. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 2186–2188 (2019)

    Google Scholar 

  25. Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  26. Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)

    Article  Google Scholar 

  27. Stone, P., Veloso, M.: Multiagent systems: a survey from a machine learning perspective. Auton. Robot. 8(3), 345–383 (2000)

    Article  Google Scholar 

  28. Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 2085–2087 (2018)

    Google Scholar 

  29. Sutton, R., Barto, A.: Reinforcement Learning: An Introduction (1988)

    Google Scholar 

  30. Tampuu, A., et al.: Multiagent cooperation and competition with deep reinforcement learning. PLOS ONE 12(4), e0172395 (2017)

    Article  Google Scholar 

  31. Tan, M.: Multi-agent reinforcement learning: independent vs. cooperative agents. In: ICML 1993 Proceedings of the Tenth International Conference on International Conference on Machine Learning, pp. 487–494 (1997)

    Google Scholar 

  32. Tsitsiklis, J.N.: Asynchronous stochastic approximation and q-learning. Mach. Learn. 16(3), 185–202 (1994)

    MATH  Google Scholar 

  33. Vinyals, O., et al.: Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)

    Article  Google Scholar 

  34. Ying, W., Dayong, S.: Multi-agent framework for third party logistics in e-commerce. Expert Syst. Appl. 29(2), 431–436 (2005)

    Article  Google Scholar 

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (61902425).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiang Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, X., Li, X., Li, Y., Xiao, B. (2022). PS-QMix: A Parallel Learning Framework for Q-Mix Using Parameter Server. In: Li, B., et al. Advanced Data Mining and Applications. ADMA 2022. Lecture Notes in Computer Science(), vol 13087. Springer, Cham. https://doi.org/10.1007/978-3-030-95405-5_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-95405-5_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-95404-8

  • Online ISBN: 978-3-030-95405-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics