Abstract
We propose a novel adaptive reinforcement learning (RL) procedure for multi-agent non-cooperative repeated games. Most existing regret-based algorithms only use positive regrets in updating their learning rules. In this paper, we adopt both positive and negative regrets in reinforcement learning to improve its convergence behaviour. We prove theoretically that the empirical distribution of the joint play converges to the set of correlated equilibrium. Simulation results demonstrate that our proposed procedure outperforms the standard regret-based RL approach and a well-known state-of-the-art RL scheme in the literature in terms of both computational requirements and system fairness. Further experiments demonstrate that the performance of our solution is robust to variations in the total number of agents in the system; and that it can achieve markedly better fairness performance when compared to other relevant methods, especially in a large-scale multiagent system.
Similar content being viewed by others
References
Bhatnagar, S., Prasad, H., Prashanth, L.: Reinforcement learning. In: Bhatnagar, S., Prasad, H., Prashanth, L. (eds.) Stochastic Recursive Algorithms for Optimization, pp. 187–220. Springer, London (2013)
Sandholm, T.W., Crites, R.H.: Multiagent reinforcement learning in the iterated prisoner’s dilemma. Biosystems 37(1–2), 147–166 (1996)
Hart, S., Mas-Colell, A.: A reinforcement procedure leading to correlated equilibrium. In: Debreu, G., Neuefeind, W., Trockel, W. (eds.) Economics Essays, pp. 181–200. Springer, Berlin (2001). doi:10.1007/978-3-662-04623-4_12
Tembine, H.: Fully distributed learning for global optima. In: Distributed Strategic Learning for Wireless Engineers, pp. 317–359. CRC Press, UK (2012)
Kalathi, D., Borkar, V.S., Jain, R.: Blackwell’s approachability in stackelberg stochastic games: a learning version. In: 53rd IEEE Conference on Decision and Control, pp. 4467–4472 (2014)
Bravo, M., Faure, M.: Reinforcement learning with restrictions on the action set. SIAM J. Control Optim. 53(1), 287–312 (2015)
Borowski, H.P., Marden, J.R., Shamma, J.S.: Learning efficient correlated equilibria. In: 53rd IEEE Conference on Decision and Control, pp. 6836–6841 (2014)
Hart, S., Mas-Colell, A.: A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5), 1127–1150 (2000)
Bowling, M.: Convergence and no-regret in multiagent learning. Adv. Neural Inf. Process. Syst. 17, 209–216 (2005)
Cigler, L., Faltings, B.: Reaching correlated equilibria through multi-agent learning. In: The 10th International Conference on Autonomous Agents and Multiagent Systems, vol. 2, pp. 509–516 (2011)
Aumann, R.J.: Correlated equilibrium as an expression of Bayesian rationality. Econometrica 55(1), 1 (1987)
Benam, M., Hofbauer, J., Sorin, S.: Stochastic approximations and differential inclusions, part II: applications. Math. OR 31(4), 673–695 (2006)
Apt, K.R., Grädel, E.: A primer on strategic games. In: Apt, K.R., Grädel, E. (eds.) Lectures in Game Theory for Computer Scientists, pp. 1–37. Cambridge University Press (2011)
Acknowledgment
This research is partially supported by the Australian Research Council Linkage Grant LP100200493.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Nguyen, D.D., White, L.B., Nguyen, H.X. (2016). Adaptive Multiagent Reinforcement Learning with Non-positive Regret. In: Kang, B.H., Bai, Q. (eds) AI 2016: Advances in Artificial Intelligence. AI 2016. Lecture Notes in Computer Science(), vol 9992. Springer, Cham. https://doi.org/10.1007/978-3-319-50127-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-50127-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50126-0
Online ISBN: 978-3-319-50127-7
eBook Packages: Computer ScienceComputer Science (R0)