SCC-rFMQ: a multiagent reinforcement learning method in cooperative Markov games with continuous actions

Zhang, Chengwei; Han, Zhuobing; Liu, Bingfu; Xue, Wanli; Hao, Jianye; Li, Xiaohong; An, Dou; Chen, Rong

doi:10.1007/s13042-021-01497-0

SCC-rFMQ: a multiagent reinforcement learning method in cooperative Markov games with continuous actions

Original Article
Published: 20 January 2022

Volume 13, pages 1927–1944, (2022)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Chengwei Zhang ORCID: orcid.org/0000-0002-9157-6050¹,
Zhuobing Han²,
Bingfu Liu¹,
Wanli Xue³,
Jianye Hao⁴,
Xiaohong Li⁴,
Dou An⁵ &
…
Rong Chen¹

496 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Although many multiagent reinforcement learning (MARL) methods have been proposed for learning the optimal solutions in continuous-action domains, multiagent cooperation domains with independent learners (ILs) have received relatively few investigations, especially in traditional RL domain. In this paper, we propose an sample based independent learning method, named Sample Continuous Coordination with recursive Frequency Maximum Q-Value (SCC-rFMQ), which divides the multiagent cooperative problem with continuous actions into two layers. The first layer samples a finite set of actions from the continuous action spaces by a re-sampling mechanism with variable exploratory rates, and the second layer evaluates the actions in the sampled action set and updates the policy using a reinforcement learning cooperative method. By constructing cooperative mechanisms at both levels, SCC-rFMQ can handle cooperative problems in continuous action cooperative Markov games effectively. The effectiveness of SCC-rFMQ is experimentally demonstrated on two well-designed games, i.e., a continuous version of the climbing game and a cooperative version of the boat problem. Experimental results show that SCC-rFMQ outperforms other reinforcement learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-agent deep reinforcement learning: a survey

Article Open access 15 April 2021

Monte Carlo Tree Search: a review of recent modifications and applications

Article Open access 19 July 2022

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Notes

https://github.com/zcchenvy/SCC-rFMQ
https://github.com/marlbenchmark/on-policy.

References

Chevalier-Boisvert M, Willems L, Pal S (2018) Minimalistic gridworld environment for openai gym. GitHub repository, GitHub. https://github.com/maximecb/gym-minigrid
Chu T, Wang J, Codecà L, Li Z (2019) Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans Intell Transport Syst 21(3):1086–1095
Article Google Scholar
Foerster JN, Farquhar G, Afouras T, Nardelli N, Whiteson S (2018) Counterfactual multi-agent policy gradients. In: Thirty-Second AAAI Conference on artificial intelligence, vol 32, no. 1. AAAI
Ganapathi Subramanian S, Poupart P, Taylor ME, Hegde N (2020) Multi type mean field reinforcement learning. In: Proceedings of the 19th International Conference on autonomous agents and multiagent systems. AAMAS, pp 411–419
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press
MATH Google Scholar
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on machine learning. PMLR, pp 1856–1865
Hao X, Wang W, Hao J, Yang Y (2019) Independent generative adversarial self-imitation learning in cooperative multiagent systems. In: Proceedings of the 18th International Conference on autonomous agents and multiagent systems. AAMAS, pp 1315–1323
Jong SD, Verbeeck K, Verbeeck K (2008) Artificial agents learning human fairness. In: Proceedings of the 7th International Joint Conference on autonomous agents and multiagent systems. AAAI, pp 863–870
Jouffe L (1998) Fuzzy inference system learning by reinforcement methods. Trans Syst Man Cybern Part C 28(3):338–355
Article Google Scholar
Konda VR, Tsitsiklis JN (2003) Actor-critic algorithms. In: Advances in neural information processing systems, pp 1008–1014
Lauer M, Riedmiller M (2000) An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: Proceedings of the Seventeenth International Conference on machine learning. Citeseer
Lauer M, Riedmiller MA (2000) An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: Proceedings of the 17h International Conference on machine learning. ICML, pp 535–542
Lazaric A, Restelli M, Bonarini A (2007) Reinforcement learning in continuous action spaces through sequential Monte Carlo methods. In: Conference on neural information processing systems. NeurIPS, pp 833–840
Li D, Yang Q, Yu W, An D, Zhang Y, Zhao W (2020) Towards differential privacy-based online double auction for smart grid. IEEE Trans Inf Forensics and Secur 15:971–986. https://doi.org/10.1109/TIFS.2019.2932911
Article Google Scholar
Li H, Wu Y, Chen M (2020) Adaptive fault-tolerant tracking control for discrete-time multiagent systems via reinforcement learning algorithm. IEEE Trans Cybern 51(99):1–12
Google Scholar
Liang H, Liu G, Zhang H, Huang T (2020) Neural-network-based event-triggered adaptive control of nonaffine nonlinear multiagent systems with dynamic uncertainties. IEEE Trans Neural Netw Learn Syst 32(5):2239–2250
Article MathSciNet Google Scholar
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971
Lowe R, WU Y, Tamar A, Harb J, Pieter Abbeel O, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems, vol 30. Curran Associates, Inc, pp 6379–6390
Matignon L, Laurent GJ, Fort-Piat NL (2007) Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In: IEEE/RSJ International Conference on intelligent robots and systems IROS. IEEE, pp 64–69
Matignon L, Laurent Gj, Le fort piat N (2012) Review: Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. Knowl Eng Rev 27(1):1–31
Article Google Scholar
Meng J, Williams D, Shen C (2015) Channels matter: multimodal connectedness, types of co-players and social capital for multiplayer online battle arena gamers. Comput Hum Behav 52:190–199
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529
Article Google Scholar
Omidshafiei S, Pazis J, Amato C, How JP, Vian J (2017) Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Proceedings of the 34th International Conference on machine learning-volume 70, pp 2681–2690. JMLR. org
Palmer G, Savani R, Tuyls K (2019) Negative update intervals in deep multi-agent reinforcement learning. In: Proceedings of the 18th International Conference on autonomous agents and multiagent systems, pp 43–51. International Foundation for Autonomous Agents and Multiagent Systems
Palmer G, Tuyls K, Bloembergen D, Savani R (2018) Lenient multi-agent deep reinforcement learning. In: Proceedings of the 17th International Conference on autonomous agents and multiagent systems, pp 443–451. International Foundation for Autonomous Agents and Multiagent Systems
Pan Y, Du P, Xue H, Lam HK (2020) Singularity-free fixed-time fuzzy control for robotic systems with user-defined performance. IEEE Trans Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2020.2999746
Article Google Scholar
Panait L, Sullivan K, Luke S (2006) Lenient learners in cooperative multiagent systems. In: Proceedings of the 5th International Joint Conference on autonomous agents and multiagent systems. AAMAS, pp 801–803
Peters J, Schaal S (2008) 2008 special issue: reinforcement learning of motor skills with policy gradients. Neural Netw 21(4):682–697
Article Google Scholar
Rashid T, Samvelyan M, Witt CS, Farquhar G, Foerster J, Whiteson S (2018) Qmix: monotonic value function factorization for deep multi-agent reinforcement learning. In: International Conference on machine learning. PMLR, pp 4295–4304
Riedmiller M, Gabel T, Hafner R, Lange S (2009) Reinforcement learning for robot soccer. Auton Robots 27(1):55–73
Article Google Scholar
Saha Ray S (2016) Numerical analysis with algorithms and programming. CRC Press, Taylor & Francis Group, Boca Raton
MATH Google Scholar
Sallans B, Hinton GE (2004) Reinforcement learning with factored states and actions. J Mach Learn Res 5:1063–1088
MathSciNet MATH Google Scholar
Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: International Conference on machine learning. PMLR, pp 1889–1897
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
Son K, Kim D, Kang WJ, Hostallero DE, Yi Y (2019) Qtran: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International Conference on machine learning. PMLR, pp 5887–5896
Sukhbaatar S, Fergus R et al (2016) Learning multiagent communication with backpropagation. In: Advances in neural information processing systems. NeurIPS, pp 2244–2252
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press
MATH Google Scholar
Sutton RS, Maei HR, Precup D, Bhatnagar S, Silver D, Wiewiora E (2009) Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the 26th Annual International Conference on Machine Learning. PMLR, pp 993–1000
Tang H, Houthooft R, Foote D, Stooke A, Chen X, Duan Y, Schulman J, De Turck F, Abbeel P (2017) # exploration: A study of count-based exploration for deep reinforcement learning. In: 31st Conference on neural information processing systems (NIPS), vol. 30, pp 1–18
Thathachar ML, Sastry PS (2002) Varieties of learning automata: an overview. Syst Man Cybern Part B Cybern IEEE Trans 32(6):711–722
Article Google Scholar
Wei E, Luke S (2016) Lenient learning in independent-learner stochastic cooperative games. J Mach Learn Res 17(1):2914–2955
MathSciNet MATH Google Scholar
Wen C, Yao X, Wang Y, Tan X (2020) Smix ($\lambda$): enhancing centralized value functions for cooperative multi-agent reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. AAAI, pp 7301–7308
Yang Y, Rui L, Li M, Ming Z, Wang J (2018) Mean field multi-agent reinforcement learning. In: The 35th International Conference on machine learning. PMLR, pp 5571–5580
Yu C, Velu A, Vinitsky E, Wang Y, Bayen A, Wu Y (2021) The surprising effectiveness of ppo in cooperative multi-agent games. arXiv preprint arXiv:2103.01955

Download references

Acknowledgements

This work is supported by the National Key R&D Program of China (No.2020YFB1006102), the National Natural Science Foundation of China (Nos.: 61906027, 61906135), and China Postdoctoral Science Foundation Funded Project (No.: 2019M661080).

Author information

Authors and Affiliations

School of Information Science and Technology, Dalian Maritime University, Dalian, China
Chengwei Zhang, Bingfu Liu & Rong Chen
National Computer Network Emergency Response Technical Team/ Coordination Center of China, Beijing, China
Zhuobing Han
School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, China
Wanli Xue
College of Intelligence and Computing, Tianjin University, Tianjin, China
Jianye Hao & Xiaohong Li
School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
Dou An

Authors

Chengwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhuobing Han
View author publications
You can also search for this author in PubMed Google Scholar
Bingfu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wanli Xue
View author publications
You can also search for this author in PubMed Google Scholar
Jianye Hao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohong Li
View author publications
You can also search for this author in PubMed Google Scholar
Dou An
View author publications
You can also search for this author in PubMed Google Scholar
Rong Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chengwei Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Bilinear interpolation

Using the bilinear interpolation techniques [31], we construct continuous game models extended by the CG (PSCG) game. Bilinear interpolation is an extension of linear interpolation for interpolating functions of two variables on a rectilinear 2D grid. The key idea is to perform linear interpolation first in one direction, and then again in the other direction (see Fig. 10).

Suppose that we want to find the value of the unknown function f at the point (x, y). It is assumed that we know the value of f at the four points $Q_{11}=(x_1, y_1)$, $Q_{12}=(x_1,y_2)$, $Q_{21}=(x_2,y_1)$, and $Q_{22}=(x_2,y_2)$. We first do linear interpolation in the x-direction:

$$\begin{aligned}\begin{array}{l} f\left( x,{y_1}\right) = \frac{{{x_2} - x}}{{{x_2} - {x_1}}}f\left( {Q_{11}}\right) + \frac{{x - {x_1}}}{{{x_2} - {x_1}}}f\left( {Q_{21}}\right) \\ f\left( x,{y_2}\right) = \frac{{{x_2} - x}}{{{x_2} - {x_1}}}f\left( {Q_{12}}\right) + \frac{{x - {x_1}}}{{{x_2} - {x_1}}}f\left( {Q_{22}}\right) \end{array}\end{aligned}$$

then proceed by interpolating in the y-direction to obtain the desired estimate:

$$\begin{aligned}f(x,y) = \frac{{{y_2} - y}}{{{y_2} - {y_1}}}f\left( x,{y_1}\right) + \frac{{y - {y_1}}}{{{y_2} - {y_1}}}f\left( x,{y_2}\right) \end{aligned}$$

Parameter setting

Table 1 where parameters used for the results presented in this section. With the detailed algorithmic descriptions in SCC-rFMQ and these parameters detail, all of the presented results are reproducible. For MAPPO, we used the official code and parameters directly (in GitHub^{Footnote 2}). For others, we select the values with the best performance after extensive simulations. In Table 1, $\epsilon$ and $\epsilon _i^{re}(s)$ are strategy valuables for SCC-rFMQ and rFMQ, which are defined as maps that decrease with increasing learning trials t. $A(0)_n$ defined in SCC-rFMQ, SMC and rFMQ are initial distributed action sets evenly sampled from action space [0, 1], where n is the sampling set size. For SMC and rFMQ, the common parameters (e.g., $\alpha _Q$ and $\gamma$) are set to be the same as SCC-rFMQ for a fair comparison. Parameter definition of $\sigma$ and $\tau$ are the same as [12], and $\lambda$ and $\sigma _L$ are the same as [8]. Parameter definitions of MADDPG and DDPG are the same as [18], where policies and critics are parameterized by a two-layer ReLU MLP followed by a fully connected layer (activated by tanh function for policy nets). Parameter definitions of L-DDQN and H-DDQN are the same as [25]. It should be noted that parameter changes do not significantly affect the conclusion of the experiment in our experiments. For SMC+rFMQ, we use rFMQ with $\epsilon$-greedy strategy to learn Q value, and use the resampling strategy of SMC to update action set every $c=200$ episodes. Weight value $w_{i}^{t+1}(s,a)$ used in the SMC resampling strategy is calculated by the Boltzmann exploration strategy, where $\varDelta {\mathrm{Q}}_i^{t + 1}(s,a) = {\mathrm{Q}}_i^{t + 1}(s,a) - {\mathrm{Q}}_i^t(s,a)$.

Table 1 Parameter setting

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, C., Han, Z., Liu, B. et al. SCC-rFMQ: a multiagent reinforcement learning method in cooperative Markov games with continuous actions. Int. J. Mach. Learn. & Cyber. 13, 1927–1944 (2022). https://doi.org/10.1007/s13042-021-01497-0

Download citation

Received: 12 November 2020
Accepted: 06 December 2021
Published: 20 January 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s13042-021-01497-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SCC-rFMQ: a multiagent reinforcement learning method in cooperative Markov games with continuous actions

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Monte Carlo Tree Search: a review of recent modifications and applications

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Bilinear interpolation

Parameter setting

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SCC-rFMQ: a multiagent reinforcement learning method in cooperative Markov games with continuous actions

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Monte Carlo Tree Search: a review of recent modifications and applications

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Bilinear interpolation

Parameter setting

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation