No-regret learning for repeated concave games with lossy bandits | IEEE Conference Publication | IEEE Xplore

No-regret learning for repeated concave games with lossy bandits


Abstract:

This paper considers no-regret learning for repeated continuous-kernel games with lossy bandit information. At each round, each player chooses an action perturbed around ...Show More

Abstract:

This paper considers no-regret learning for repeated continuous-kernel games with lossy bandit information. At each round, each player chooses an action perturbed around its intended action, and gets the utility value at the corresponding action profile. However, due to various uncertainties or high inquiring costs, the bandit feedback may be lost at random. Therefore, we focus on studying the asynchronous learning strategy of the players to adaptively adjust next actions for minimizing the long-term regret loss compared with a best-fixed action in the hindsight. The paper provides a novel no-regret learning algorithm, called Reweighted Online Gradient Descent with bandit (ROGD-b). We first give the regret analysis for continuous concave games with differentiable and Lipschitz utilities. Furthermore, we show that the action profile converges to Nash equilibrium with probability 1 when the game is strictly monotone. Numerical experiments are given to illustrate the performance of the algorithm.
Date of Conference: 14-17 December 2021
Date Added to IEEE Xplore: 01 February 2022
ISBN Information:

ISSN Information:

Conference Location: Austin, TX, USA

Contact IEEE to Subscribe

References

References is not available for this document.