Abstract:
Bertsekas recently proposed Asynchronous Policy Iteration (API) as an alternative algorithm of Policy Iteration (PI) for solving the problem of two-player zero-sum Markov...View moreMetadata
Abstract:
Bertsekas recently proposed Asynchronous Policy Iteration (API) as an alternative algorithm of Policy Iteration (PI) for solving the problem of two-player zero-sum Markov games. To quantifying the benefits of API, besides its flexibility for parallel and asynchronous implementation, the focus of this paper is to derive the computational complexity of API. We show that to reach within ϵ error to the optimal value function, the computational complexity of API is at most O (poly (n, m
1
, m
2
, ln(1/(1 − γ))), where n is the number of states, m
1
, m
2
are the number of actions for player 1 and player 2 respectively, and γ is the discount factor.
Published in: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 14-19 April 2024
Date Added to IEEE Xplore: 18 March 2024
ISBN Information: