Article

Multi-agent reinforcement learning algorithm to handle beliefs of other agents' policies and embedded beliefs

Authors:
Takaki Makino

The University of Tokyo, Kashiwa, Chiba, Japan

The University of Tokyo, Kashiwa, Chiba, Japan
View Profile

,
Kazuyuki Aihara

The University of Tokyo, Komaba, Meguro-ku, Tokyo, Japan and ERATO Aihara Complexity Modelling Project, Uehara, Shibuya-ku, Tokyo, Japan

The University of Tokyo, Komaba, Meguro-ku, Tokyo, Japan and ERATO Aihara Complexity Modelling Project, Uehara, Shibuya-ku, Tokyo, Japan
View Profile

AAMAS '06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systemsMay 2006Pages 789–791https://doi.org/10.1145/1160633.1160772

Published:08 May 2006Publication History

AAMAS '06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems

Pages 789–791

ABSTRACT

We have developed a new series of multi-agent reinforcement learning algorithms that choose a policy based on beliefs about co-players' policies. The algorithms are applicable to situations where a state is fully observable by the agents, but there is no limit on the number of players. Some of the algorithms employ embedded beliefs to handle the cases that co-players are also choosing a policy based on their beliefs of others' policies. Simulation experiments on Iterated Prisoners' Dilemma games show that the algorithms using on policy-based belief converge to highly mutually-cooperative behavior, unlike the existing algorithms based on action-based belief.

References

C. Claus and C. Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proc. of AAAI-98, pages 746--752. 1998. Google ScholarDigital Library
J. Hu and M. P. Wellman. Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proc. of ICML 1998, pages 242--250. 1998. Google ScholarDigital Library
M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proc. of ICML 1994, pages 157--163.1994.Google ScholarCross Ref
T. Makino and K. Aihara. Self-observation principle for estimating the other's internal state. Mathematical Engineering Technical Reports METR 2003--36, the University of Tokyo, Oct. 2003.Google Scholar
L. Panait and S. Luke. Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems, 11:387--434, 2005. Google ScholarDigital Library
Y. Shoham, R. Powers, and T. Grenager. On the agenda(s) of research on multi-agent learning. In Proc. of Artificial Multiagent Learning: Papers from the 2004 AAAI Fall Symposium, Technical Report FS-04-02. 2004.Google Scholar
M. Weinberg and J. S. Rosenschein. Best-response multiagent learning in non-stationary environments. In Proc. of AAMAS'04, pages 506--513, 2004. Google ScholarDigital Library

Recommendations

Changing conditional beliefs unconditionally
TARK '96: Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge

Although the AGM account of belief change tells us how to change <i>un</i>conditional beliefs, it fails to guide us in changing our conditional beliefs. That explains why, in the AGM account of belief change proper, a decent account of iterated belief ...
Read More
Beliefs in agent implementation
DALT'05: Proceedings of the Third international conference on Declarative Agent Languages and Technologies

This paper extends a programming language for implementing cognitive agents with the capability to explicitly represent beliefs and reason about them. In this programming language, the beliefs of agents are implemented by modal logic programs, where ...
Read More
Reconstructing an Agent's Epistemic State from Observations about its Beliefs and Non-beliefs

We look at the problem in belief revision of trying to make inferences about what an agent believed—or will believe—at a given moment, based on an observation of how the agent has responded to some sequence of previous belief revision inputs over time. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AAMAS '06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
May 2006
1631 pages
ISBN:1595933034
DOI:10.1145/1160633
General Chairs:
Hideyuki Nakashima
Future University - Hakodate, Japan
,
Michael Wellman
University of Michigan
,
Program Chairs:
Gerhard Weiss
Technical University Munich, Germany
,
Peter Stone
The University of Texas at Austin
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 May 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
embedded belief
iterated prisoners' dilemma
multi-agent reinforcement learning
policy-based belief
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,155of5,036submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 381
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multi-agent reinforcement learning algorithm to handle beliefs of other agents' policies and embedded beliefs

AAMAS '06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems

ABSTRACT

References

Cited By

Recommendations

Changing conditional beliefs unconditionally

Beliefs in agent implementation

Reconstructing an Agent's Epistemic State from Observations about its Beliefs and Non-beliefs