Abstract
We consider a group of several non-Bayesian agents that can fully coordinate their activities and share their past experience in order to obtain a joint goal in face of uncertainty. The reward obtained by each agent is a function of the environment state but not of the action taken by other agents in the group. The environment state (controlled by Nature) may change arbitrarily, and the reward function is initially unknown. Two basic feedback structures are considered. In one of them — the perfect monitoring case — the agents are able to observe the previous environment state as part of their feedback, while in the other — the imperfect monitoring case — all that is available to the agents are the rewards obtained. Both of these settings refer to partially observable processes, where the current environment state is unknown. Our study refers to the competitive ratio criterion. It is shown that, for the imperfect monitoring case, there exists an efficient stochastic policy that ensures that the competitive ratio is obtained for all agents at almost all stages with an arbitrarily high probability, where efficiency is measured in terms of rate of convergence. It is also shown that if the agents are restricted only to deterministic policies then such a policy does not exist, even in the perfect monitoring case.
Similar content being viewed by others
References
N. Alon, J.H. Spencer and P. Erdos, The Probabilistic Method (Wiley-Interscience, New York, 1992).
R.J. Aumann and M.B. Maschler, Repeated Games with Incomplete Information (MIT Press, Cambridge, MA, 1995).
D. Blackwell, An analog of the minimax theorem for vector payoffs, Pacific Journal of Mathematic 6 (1956) 1-8.
R. Brafman and M. Tennenholtz, Axiom systems for qualitative decision criteria, in: Proceedings of AAAI-97 (1997).
H. Chernoff, A measure of the asymptotic efficiency for tests of a hypothesis based on the sum of observations, Annals of Mathematical Statistics 23 (1952) 493-509.
D. Fudenberg and D. Levine, Theory of learning in games (memo, 1997).
D. Fudenberg and J. Tirole, Game Theory (MIT Press, Cambridge, MA, 1991).
J.C. Harsanyi, Games with incomplete information played by bayesian players, Parts i, ii, iii, Management Science 14 (1967) 159-182.
L.P. Kaelbling, M.L. Littman and A.W. Moore, Reinforcement learning: A survey, Journal of Artificial Intelligence Research 4 (1996) 237-258.
D. Kreps, Notes on the Theory of Choice (Westview Press, 1988).
D. Kreps, A Course in Microeconomic Theory (Princeton University Press, 1990).
M.L. Littman, Markov games as a framework for multi-agent reinforcement learning, in: Proc. 11th Int. Conf. on Machine Learning (1994) pp. 157-163.
R.D. Luce and H. Raiffa, Games and Decisions — Introduction and Critical Survey (Wiley, New York, 1957).
J.-F. Mertens, S. Sorin and S. Zamir, Repeated games, Part A, CORE, DP-9420 (1995).
J. Milnor, Games against nature, in: Decision Processes, eds. R.M. Thrall, C.H. Coombs and R.L. Davis (Wiley, New York, 1954).
D. Monderer and M. Tennenholtz, Dynamic non-Bayesian decision-making, Journal of Artificial Intelligence Research 7 (1997) 231-248.
Y. Moses and M. Tennenholtz, Multi-entity models, Machine Intelligence 14 (1995) 63-88.
C.H. Papadimitriou and M. Yannakakis, Shortest paths without a map, in: Automata, Languages and Programming, 16th International Colloquium Proceedings (1989) pp. 610-620.
S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach (Prentice-Hall, Englewood Cliffs, NJ, 1995).
L.J. Savage, The Foundations of Statistics (Dover, New York, 1972).
S. Sen, Adaptation and learning in multiagent systems, IJCAI-95 Workshop Program, Working Notes (1995).
L.S. Shapley, Stochastic games, Proc. Nat. Acad. Sci. U.S.A. 39 (1953) 1095-1100.
L.G. Valiant, A theory of the learnable, Comm. ACM 27(11) (1984) 1134-1142.
M. Wellman and J. Doyle, Modular utility representation for decision-theoretic planning, in: Proceedings of the 1st International Conference on AI Planning Systems (Morgan, San Mateo, CA, 1992).
M.P. Wellman, Reasoning about preference models, Technical Report MIT/LCS/TR-340, Laboratory for Computer Science, MIT (1985).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Monderer, D., Tennenholtz, M. Dynamic non-Bayesian decision making in multi-agent systems. Annals of Mathematics and Artificial Intelligence 25, 91–106 (1999). https://doi.org/10.1023/A:1018917719749
Issue Date:
DOI: https://doi.org/10.1023/A:1018917719749