ABSTRACT
A plethora of recent work has analyzed properties of outcomes in games when each player employs a no-regret learning algorithm. Many algorithms achieve regret against the best fixed action in hindisght that decays at a rate of O(1/'T), when the game is played for T iterations. The latter rate is optimal in adversarial settings. However, in a game a player's opponents are minimizing their own regret, rather than maximizing the player's regret. (Daskalakis et al. 2014) and (Rakhlin and Sridharan 2013) showed that in two player zero-sum games O(1/T) rates are achievable. In (Syrgkanis et al. 2015), we show that O(1/T3/4) rates are achievable in general multi-player games and also analyze convergence of the dynamics to approximately optimal social welfare, where we show a convergence rate of O(1/T). The latter result was subsequently generalized to a broader class of learning algorithms by (Foster et al. 2016). This is based on joint work with Alekh Agarwal, Haipeng Luo and Robert E. Schapire.
Supplemental Material
- Avrim Blum, MohammadTaghi Hajiaghayi, Katrina Ligett, and Aaron Roth. 2008. Regret Minimization and the Price of Total Anarchy. In Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing (STOC ’08). ACM, New York, NY, USA, 373–382. https:// Google ScholarDigital Library
- Nicolo Cesa-Bianchi and Gabor Lugosi. 2006. Prediction, Learning, and Games. Cambridge University Press, New York, NY, USA. Google ScholarDigital Library
- Constantinos Daskalakis, Alan Deckelbaum, and Anthony Kim. 2014. Nearoptimal no-regret algorithms for zero-sum games. Games and Economic Behavior 0 (2014), –. https://Google Scholar
- Dylan J Foster, Thodoris Lykouris, Karthik Sridharan, Eva Tardos, et al. 2016. Learning in games: Robustness of fast convergence. In Advances in Neural Information Processing Systems. 4727–4735.Google Scholar
- Yoav Freund and Robert E Schapire. 1997. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. System Sci. 55, 1 (1997), 119 – 139. Google ScholarDigital Library
- Drew Fudenberg and David K. Levine. 1998. The Theory of Learning in Games. MIT Press, Cambridge, MA.Google Scholar
- Jyrki Kivinen and Manfred K. Warmuth. 1997. Exponentiated Gradient Versus Gradient Descent for Linear Predictors. Inf. Comput. 132, 1 (Jan. 1997), 1–63. https:// Google ScholarDigital Library
- Nick Littlestone and Manfred K Warmuth. 1994. The weighted majority algorithm. Information and computation 108, 2 (1994), 212–261. Google ScholarDigital Library
- Alexander Rakhlin and Karthik Sridharan. 2013. Online Learning with Predictable Sequences. In COLT 2013. 993–1019.Google Scholar
- Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, and Robert E Schapire. 2015. Fast convergence of regularized learning in games. In Advances in Neural Information Processing Systems. 2989–2997. Google ScholarDigital Library
- Abstract 1 Introduction ReferencesGoogle Scholar
Index Terms
Fast convergence of learning in games (invited talk)
Recommendations
No-regret learning for repeated non-cooperative games with lossy bandits
AbstractThis paper considers no-regret learning for repeated continuous-kernel games with lossy bandit feedback. Since it is difficult to give an explicit model of the utility functions in dynamic environments, the players’ actions can only be learned ...
Playing Repeated Security Games with No Prior Knowledge
AAMAS '16: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent SystemsThis paper investigates repeated security games with unknown (to the defender) game payoffs and attacker behaviors. As existing work assumes prior knowledge about either the game payoffs or the attacker's behaviors, they are not suitable for tackling ...
Algorithmic game theory and econometrics
The traditional econometrics approach for inferring properties of strategic interactions that are not fully observable in the data, heavily relies on the assumption that the observed strategic behavior has settled at an equilibrium. This assumption is ...
Comments