Learning in Two-Player Matrix Games by Policy Gradient Lagging Anchor

Shiyao DING; Toshimitsu USHIO

doi:10.1587/transfun.E102.A.708

Abstract

It is known that policy gradient algorithm can not guarantee the convergence to a Nash equilibrium in mixed policies when it is applied in matrix games. To overcome this problem, we propose a novel multi-agent reinforcement learning (MARL) algorithm called a policy gradient lagging anchor (PGLA) algorithm. And we prove that the agents' policies can converge to a Nash equilibrium in mixed policies by using the PGLA algorithm in two-player two-action matrix games. By simulation, we confirm the convergence and also show that the PGLA algorithm has a better convergence than the L_R-I lagging anchor algorithm.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!