Multi-Agent Reinforcement Learning Algorithm with Variable Optimistic-Pessimistic Criterion

Akchurina, Natalia

doi:10.3233/978-1-58603-891-5-433

Abstract

A reinforcement learning algorithm for multi-agent systems based on variable Hurwicz's optimistic-pessimistic criterion is proposed. The formal proof of its convergence is given. Hurwicz's criterion allows to embed initial knowledge of how friendly the environment in which the agent is supposed to function will be. Thorough testing of the developed algorithm against well-known reinforcement learning algorithms has shown that in many cases its successful performance can be explained by its tendency to force the other agents to follow the policy which is more profitable for it. In addition the variability of Hurwicz's criterion allowed it to converge to best-response against opponents with stationary policies.

Contact

IOS Press Copyright 2025

Contact

IOS Press Copyright 2025

This website uses cookies

This website uses cookies